C++ Week 5

Procedures and Functions II

Declaring Functions

Suppose you had the following:

void funcA()
{	....
	funcB();
	....
}

void funcB()
{	....
}

As it stands, this would not compile. When the compiler encounters the call to to funcB(), it doesn't yet know that funcB() exists. To overcome this problem you have to declare funcB() before the call to funcB(), so that the compiler knows what to do when it encounters the call.

The declaration of a function is known as a function prototype. It means that the function is in scope and can be called. A prototype should contain:

The function's return type
The function's name
The function's parameter list, containing the type of each parameter (you can include the identifiers, but it is not necessary)

In essence the prototype looks the same as the first line of the function's definition, but it ends in a semicolon. For example:

bool is_div_by(int, int);

bool is_div_by(int dividend, int divisor);

A prototype can be global. For example,

void funcB();	// global prototype of funcB.
		// funcB is now in scope till the end of the program, so can be called from anywhere after here

void funcA()
{	...
	funcB();	// Compiler can cope with this call, since it now knows about funcB
	...
}

Or a prototype can be local:

void funcA()
{	void funcB();	// local prototype makes funcB available within the definition of funcA, but not outside
	...
	funcB();
	...
}

Parameter passing by value and by reference

Given a procedure header such as void display(int a, int b) and a call such as display(x, y), the arguments remain unchanged - whatever display does, it will have no effect on x and y. Only the values of x and y are passed to the procedure - the arguments are passed by value. If you wish the procedure to modify the arguments, you can pass the arguments by reference. To specify a pass by reference, place an ampersand & after the typename in the function's parameter list. For example:

void swap (int& a, int& b)

Given the call swap(x, y), the aim of this procedure is to exchange the values of variables x and y, so the procedure needs to have access to the variables x and y - getting just their values would be no help - hence the use of passing by reference.

When parameters are passed by value, the arguments are expressions, as in display(20); or display(x); or display(x + y); but the arguments for reference parameters are variables. To summarize:

By Value	By Reference
Arguments are expressions	Arguments are variables
Parameters are like local variables in the function	Parameters are like alternative names for the arguments
Changing the parameter inside the function does not change the argument	Changing the parameter inside the function does change the argument

A value parameter occupies as much memory space in the function as is necessary to accommodate its type of value, and the value gets copied into this space every time the function is called. If the value being passed occupies a lot of space (such as a large object taking up thousands of bytes), the space and time overhead of pass-by-value might be significant. You might then use pass by reference for efficiency. If you are worried about the possibility that you might modify the argument by mistake, you can use the const keyword to protect the argument:

int func (const Bigclass& b)

This is known as a "const ref" parameter.

Constants

Numbers often appear scattered throughout code, but they are a disaster from a maintenance point of view. Numbers do not lend themselves to uniqueness and it is not always clear what a number is supposed to represent or how it is related to another number. Consider the following example that shows how the relationship between two numbers could be overlooked:

if (x < 200)
{  ...

   ...
   for (i = 199; i >= 0; i--)

For this reason, you should avoid these "magic numbers" and use constants instead. Constants are defined using the const keyword, and constant identifiers are often written in upper case to distinguish them from variables. For example:

int const SAMPLE = 200;

if (x < SAMPLE)
{  ...

   ...
   for (i = SAMPLE - 1; i > 0; i--)

Example - Box procedure, and default arguments

In this example we will create a box() procedure to create a box around a string. It does this by calling the line() procedure to create the top of the box, then it adds characters before and after the string and finally it creates the bottom of the box. The end result should be a procedure call and output as follows:

box ("Hello!", "*");



********
*Hello!*

********

First we must define the line() procedure

void line(string s, int n)
{  for (int i = 0; i < n; i++)
      cout << s;
   cout << endl;
}

Now we can define the procedure box() that will actually print the box around the string argument. The character used to create the box is also passed to the procedure:

void box(string message, string ch)
{  line(ch, message.length() + 2);
   cout << ch << message << ch;
   line(ch, message.length() + 2);
}

In order for box() to call line(), either line() has to be defined before box(), or a global prototype of line() has to appear before the definition of box() or box() has to contain a local prototype of line().

It would be nice if we could specify a default character to be used as the box outline, but which we could override if desired. This can be done by assigning a value to the parameter in the definition as follows:

void box(string s, string ch = "*")
{...
}

In calls to this procedure, the second argument is optional (because a default argument is provided in the procedure definition), and both of the following calls are valid:

box("abc", "+");
box("abc");

A limitation of this feature is that mandatory parameters must all come before optional parameters in the parameter list. The compiler will not accept calls like proc(a,,c);

Side-effects

A function that does something else apart from just returning its value, such as taking some input or generating some output or changing the values of variables (other than its own local ones) is said to have a side-effect. Side-effects are generally best avoided; they make programs hard to debug. Consider the following:

int silly(int& a)		// Because this is a reference parameter,
{	a *= a;		// this line changes the value of the argument.
	return a * 2;
}

int main()
{	int	x = 0, y = -2;
	x = silly(y);	// This looks as though it changes x, which it does, but it also changes y !

In general, the functions you write should not have side-effects. Side-effects can occasionally have their uses, however, and some of the standard functions are actually functions that have side-effects. For example, getline not only reads in a line of input, but also returns something. Though we think of the reading in as the main thing that it does, it is technically a side-effect of the evaluation of the function. What it returns is a reference to the input stream. If the call is getline(cin, s); it returns a reference to cin. So you could, if you wanted, use the returned reference as the first argument to another getline, as in

getline(getline(cin,s1), s2);

which would read in two lines, the first into s1, the second into s2.

If it fails, it returns a NULL reference. A valid reference can be type-converted to true and a NULL reference to false, which is why we can use it in expressions such as while (getline(cin,s))

The same is true of the standard I/O operators (>> and <<). When cin >> is executed, the taking of input is actually a side-effect; it also returns a reference to cin. This is why you can concatenate several of them in one statement - cin >> x >> y >> z. When the cin >> x is evaluated, it pulls some input into x and also, if successful, returns a reference to cin. This now forms the left-hand argument for the second >>, and so on.

Global variables

Variables defined within a block are local to that block. For example,

{	int x;
	for (int y = 0; y < 5; y++)
	{	int z;
		// x, y and z accessible here
	}
	// x accessible here, but neither y nor z
}
// neither x, nor y, nor z accessible here

Variables defined outside any block are global. They are accessible from the point of definition to the end of the program.

By default variables in a procedure or function only have local scope. If a local variable is declared using the same identifier as a global variable, the local variable is the one used. For example:

int x;          // global


void procedure1 ( ) 
{int x;              // local to procedure1; its existence masks the global one
}



void procedure2 ( )
{int x;              // local to procedure2
}

Globals should be used only sparingly, if at all. They lead all too easily to mistakes like the following:

int count;

while (...)
{	dosomething();
	count++;	// we think we're counting something
}

...

void dosomething()
{	...
	count++;	// but this procedure is using the same variable!
}

It's easy to imagine how mistakes of this kind in a large program (thousands of lines) could be very difficult to track down.

The extern keyword

In general you want to grant access (i.e. access to variables) to just those parts of the program that need it and to deny it to the rest. For example, suppose that main calls procedures a and b and that a and b both use the same variable x (I don't mean that each has its own local variable called x but that they need to access one and the same variable x). We can make it local to main and pass it to a and to b, but then main has access to it, which we don't want. Or, as another example, consider that main shares access to some variable z with procedure d, but that procedure d is called from within procedure c. main calls c and c calls d. If z is local to main, main has to pass it to c to pass on to d. So then c has access to z, which we don't want.

We could make our variables global, defined early in the program, but then the whole program has access to them, which we certainly don't want.

One solution to this problem is to make x global but to put its definition at the end of the program. We now make it accessible to those functions that need it by means of extern declarations, thus:

void a( )
{  extern int x; //procedure a can access the (global) variable x but the definition of x is elsewhere
   ...
}

...              // x not accessible here

void b( )
{  extern int x; //procedure b can also access x
   ...
}

...              // x not accessible here

int x;    // global definition of x on the last line of the program

Lifetimes

Look at the following code:

bool three_in_a_row (bool win)
{  int nwins = 0;       // nwins is local to the function
   if (win) nwins++;
   else nwins = 0;
   return nwins == 3;
}


{  if (user_wins_game)
	{  win = true;
		prize();
	}
	else win = false;
	if (three_in_a_row(win)) jackpot();
}

The three_in_a_row function won't work because a new nwins variable is created every time the function is called and destroyed when the function returns. Local variables are auto by default, meaning that they exist only so long as the function is executing. An alternative is to define nwins as a static as follows:

bool three_in_a_row (bool win)
{   static int nwins;

A static variable comes into existence at the start of the program and exists until the program ends. It keeps its value from one call to the next - it provides a function with a memory.

Note that lifetimes are different from scope. A static local variable is still a local variable, and therefore only accessible from within the function - being static does not affect its scope. But it exists (albeit inaccessibly) from one call to the next.

In addition, static variables are initialized, by default, to zero (the initial value of an auto variable is undefined, as you know). If a static variable is explicitly initialized, the initialization takes place only once - when the execution of the function first reaches the initialization. After that, it retains its value from the previous call.

All global variables are static - they exist from the start of the program to the end and they are initialized by default to zero.

As another illustration of the static keyword, let's look at the following code:

#include <iostream>

using namespace std;

int glob;                  // global, therefore static, therefore initialized to zero

void proc( )
{  static int statloc = glob;  
   int autoloc = 3; 
   cout << autoloc << " " << statloc << endl;
   statloc++; autoloc++;   // incrementing of autoloc is pointless; it is about to disappear
}

int main( )
{  cout << glob << endl;   // glob is zero
   glob = 5;   
   cout << glob << endl;   // glob is 5 
   proc();                 // statloc is 5, autoloc is 3, statloc increased to 6 
   glob = 8;
   cout << glob << endl;   // glob is 8
   proc();                 // statloc still 6 - it is not reinitialized 
}

The output of this program is:

extern declarations and static variables provide a partial solution to the problem of protecting data (by restricting access to it) within the context of procedural programming, such as programming in C. Object-oriented programming, supported by C++, provides a much more satisfactory solution, with data and the functions that act on it being packaged up inside objects, as we will see later in the course.

Function calls and the system stack

When a program is running, the computer reserves a portion of the memory for the system stack. When a function (or procedure) is called, an activation record for that function is created and placed on the top of the stack. This activation record will contain the function's parameters and local variables and also the return address, so that control returns to the right place in the (executable) program when the function has finished. The activation record stays on the stack until the function has finished, at which point the record is effectively destroyed.

Consider the structure of the following program:

int func1(int x)
{	....		// C
}
int func2(int x)
{	int z;
	....		// E
}
void proc(int x)
{	int y;
	....		// B
	y = func1(x);
	....		// D
	y = func2(x);
	....		// F
}
int main()
{	int m;
	....		// A
	proc(m);
	....		// G
}

Just after main has begun (at point A), the stack will contain the activation record for main:

	main's local variable m
	main's return address (where control returns to when the program terminates)

Then, after main has made a call to proc (so execution is at point B), the stack will contain this:

	proc's parameter x
	proc's local variable y
	proc's return address
	main's local variable m
	main's return address

Then, after proc has called func1 (we're now at point C), the stack will contain this:

	func1's parameter x
	func1's return address
	proc's parameter x
	proc's local variable y
	proc's return address
	main's local variable m
	main's return address

After func1 has returned and we are at point D, the stack will contain this:

	proc's parameter x
	proc's local variable y
	proc's return address
	main's local variable m
	main's return address

After we've called func2, so we are now at E, we have this:

	func2's parameter x
	func2's local variable z
	func2's return address
	proc's parameter x
	proc's local variable y
	proc's return address
	main's local variable m
	main's return address

At F we are back to this:

	proc's parameter x
	proc's local variable y
	proc's return address
	main's local variable m
	main's return address

And at G we are down to this:

	main's local variable m
	main's return address

So you can see why it is that local, auto variables come into being and disappear in the way that they do. Globals and statics are not stored on the stack. They are held in a separate part of memory and stay throughout the program.

Notes on R. Mitton's lectures by S.P. Connolly, edited by R. Mitton, 2000