C++ Week 8

Classes II, and integer representation

More on arrays

A one-dimensional array is like a one-dimensional vector; it's a row of items of the same type in which you can refer to individual items with the square-bracket notation. You could define an array of, say, seven integers as follows:

int harry[7];
The compiler would reserve enough storage for seven contiguous integers, and you could refer to these as harry[0], harry[1] and so on up to harry[6] (the first element is always element zero). Note that, when you are defining an array, as above, harry[7] does not mean "element seven of the array harry", but rather, "I want an array of seven elements, called harry."

Given the above definition, the individual integers in harry would have undefined values. You could initialize the values – and this is a feature of arrays that vectors don't have – as follows:

int harry[7] = {42, -5, 68, 91, 3, 1002, 12};
Or you might want to initialize just the first few elements:
int harry[7] = {42, -5, 68, 91};
in which case the remaining elements would be initialized to zero.

If you don't do any initialization, you get undefined values, but, if you initialize any of the elements, even if it's only the first one, then all the rest get initialized to zero. So you could initialize the whole lot to zero with just:

int harry[7] = {0};

If you initialize all the values explicitly, you don't have to specify the length of the array. You could define harry as follows:

int harry[ ] = {42, -5, 68, 91, 3, 1002, 12};
In this case, the compiler calculates the length of array on the basis of the initialization.

Arrays are not objects in the C++ sense – there is no array class (as there is a vector class and a string class). So they don't have any functionality. We cannot say things like harry.size() or harry.push_back(77) The length of an array is fixed at compile time; you cannot grow it or shrink it during the run of the program, as you can with vectors.

A simple Stack class

A linear list is, conceptually, just a series of items one next to another. A stack is a linear list where you are limited to putting an item on the stack (known as "pushing") or taking an item off the stack (known as "popping") and where both these operations take place at the same end of the list. It's sometimes called a LIFO list (Last In First Out). It is one of the fundamental data structures widely used in programming.

The following is one possible implementation of a stack, which uses an array to store the items (here integers, but, in principle, items of any type):

const int MAX = 10;

class Stack
{ public:
	Stack();
	void push(int);
	int pop();
 private:
	int	a[MAX+1];  // we won't be using a[0]
	int	top;       // "stack pointer"
};

Stack::Stack()
{	top = 0;
}

void Stack::push(int x)
{	if (top >= MAX)
	{	cerr << "Stack overflow!" << endl;
		return;
	}
	top++;
	a[top] = x;
}

int Stack::pop()
{	if (top == 0)
	{	cerr << "Stack underflow!" << endl;
		return 0;
	}
	int	returnval = a[top];
	top--;
	return returnval;
}

A simple Queue class

The queue, another fundamental data structure, is also a linear list with restricted access, but, this time, items are added to the queue at one end but removed at the other – a FIFO (First In First Out) list.

class Queue
{ public:
	Queue();
	void add(int);
	int remove();
 private:
	int	a[MAX+1];  // we won't be using a[0]
	int	front, rear, n;  // n = number of items in the queue
};

Queue::Queue()
{	n = 0;
	front = rear = 1;  // any value >= 1 and <= MAX
}

void Queue::add(int x)
{	if (n == MAX)
	{	cerr << "Queue overflow!" << endl;
		return;
	}
	n++;
	rear = (rear == MAX ? 1 : rear + 1);
	a[rear] = x;
}

int Queue::remove()
{	if (n == 0)
	{	cerr << "Queue underflow!" << endl;
		return 0;
	}
	n--;
	front = (front == MAX ? 1 : front + 1);
	return a[front];
}

This implementation of the queue is perhaps less intuitively obvious than that of the stack. When people are in a queue (for tickets, say) and someone at the front gets their ticket and leaves, the whole queue shuffles forwards. In this array implementation, however, there is no shuffling forwards, so the whole queue, in a sense, moves backwards through the array, as items are added to the rear and removed from the front. When the rear has reached a[MAX], the next addition is at a[1] (assuming it has been vacated by items leaving the queue). The effect is that the queue, so long as it does not overflow, moves round and round the array.

static const int

In the above, I defined the const int as a global. Although this will work, it seems unsatisfactory since the length of the array is a feature of the data structure used by the objects of type Stack or Queue. In other words, this const int ought to be private to these classes, just as the array definitions are. We can make it private to our Stack definition as follows (and we could do the same for the Queue):

 private:
	static const int MAX = 10;
	int	a[MAX+1];  // we won't be using a[0]
	int	top;       // "stack pointer"
Defining it as "static" means that it belongs to the class as a whole rather than to individual objects of that class. If we had several objects of type Stack in our program, then each of them would have its own array and its own top variable, but they would all share a single const int.

You can have static variables, as well as static constants. For example, a class that dealt with currency exchange might have an exchange_rate variable. This would be a variable – it would change from time to time – but you would not want different objects of the class to have different values of it. You would want them all to share the same, current value, and you could ensure this by making it static. With variables, you can decide whether to make them static or not, but with constants you have no choice – they have to be static. (It is actually only constants of integral type that you can define within the class definition, as in the example above; you can have private, static constants of other types but defining them is more long-winded, as we'll see in a later lecture.)

Representation of integers

When you compile programs containing expressions such as i < s.length() or i < v.size(), the (Borland) compiler gives you a warning about comparing signed and unsigned values. What is it warning you about?

The unsigned integers are all the integers >= 0; there are no negative ones, so we don't need to worry about a minus sign, hence the name "unsigned". The internal representation of unsigned integers is very straightforward; the computer simply stores the integer in binary. (If you are unclear about binary numbers, have a look at "A lesson on octal, hex, and binary".) Supposing we had six bits to represent an integer (32 bits would be more realistic), then the range would go from 000000, representing 0, to 111111, representing 63. Storing signed integers, which include negatives as well as positives (signed integers are denoted by int in C++), is not so easy.

One way, known as sign and magnitude, reserves one bit for the sign, usually using 0 to mean "positive" and 1 for "negative", and the remaining bits for the magnitude. Given our six bits, then 9 would be represented as 001001 (0 for the sign, then 01001), while -9 would be 101001. Though conceptually simple, this arrangement does not lend itself to easy implementation in hardware, and it has the peculiar feature of having two representations for zero: 000000 and 100000, the latter being a mysterious "negative zero".

More common is a system called two's complement. Continuing with our six bit example, the numbers 0 to 31 would be represented exactly like their unsigned versions: 000000 to 011111. All the patterns beginning with 1, however, represent negative numbers, from -32 (100000) to -1 (111111). (If you forget everything else about two's complement, remember at least that all ones is minus one, and you can probably work out the rest from that.) To get the two's complement representation of a negative number, say -9, you first write down the positive version (001001) then flip all the ones to zeros and all the zeros to ones (110110) and finally add 1 (110111). (To get minus one, you start with positive one (000001), then flip (111110), then add 1, giving 111111.)

A great advantage of this system is that simple addition of two numbers, regardless of whether they are positive or negative, gives the right answer, provided that the answer is within our range (i.e. from -32 to +31). For example:

   -9  110111
+ -20  101100
=      100011  which is -29
(Actually, adding 110111 and 101100 gives a seven-bit number 1100011, but the "carry" bit falls off the left-hand end of our six-bit range and we just ignore it.)
   -9  110111
+  20  010100
=      001011  which is 11

Subtraction is easy. To calculate x - y, we just calculate x + -y. For example

    9  001001
-   4  000100

converts to

    9  001001
+  -4  111100
=      000101  which is 5

But what happens if the answer lies outside our range? Let's try 20 + 15, which gives 35, which is higher than our maximum of 31:

   20  010100
+  15  001111
=      100011  which is -29
If the answer lies outside our range, our calculations will simply yield a wrong answer. This is known as "integer overflow".

It would be possible for the run-time system to detect this and to terminate the program, as it does with division by zero, or at least output a message. If adding two positive numbers yields a negative result, or adding two negative numbers yields a positive one, then we have overflow. (We are never going to have overflow from adding a positive and a negative.)

In fact C++ does not do this. A C++ program will cheerfully generate integer overflow and produce nonsense results without a hint of warning. For example, the largest signed integer that can be represented in 32 bits is 2,147,483,647. If you set an int variable equal to this value, then added, say, 2,100,000,000 to it, this would obviously cause overflow since the correct result of 4,247,483,647 is too large to be represented. But the program would give you no warning about this and, if you then output the result, you would get -47483649.

However, C++ does at least enable the programmer to prevent it. If you include the climits library, you can use the values INT_MAX and INT_MIN, being the largest and smallest signed integer values that can be represented in your implementation. (It can vary between implementations depending on how many bytes are allocated to the storage of an integer.)

Suppose you have two int variables big and x, and you are about to add x to big, but you think that big might be close to INT_MAX. You might deal with this as follows:

if (INT_MAX - big >= x)
	big += x;
else	cerr << "Integer overflow adding " << x << " to " << big << endl;

Note that it is no use to write, as students are sometimes tempted to:

if (big + x > INT_MAX)
Consider what would happen in our six-bit world, where INT_MAX is 31, if big was 20 and x was 15. big + x would be -29, not 35 (see earlier example). In fact signed_int_value > INT_MAX can never be true.

Mixing signed and unsigned values

To return to the compiler warning about comparing signed and unsigned values, the first thing we need to know is that s.length() and v.size() return unsigned integers. This is reasonable since the length of a string, or the size of a vector, can never be negative.

The second thing we need to know is that, if you do arithmetic using a signed and an unsigned integer, the result is unsigned. This is probably not what you would expect. Suppose you had an expression such as s.length() - 1 and suppose that the string was empty. Let's do it with our six-bit integers:

s.length()                  0  000000
- 1, which converts to   + -1  111111
=                              111111  which is -1 as a signed int, but 63 as an unsigned int
So, if this was part of a for loop, such as for (int i = 0; i < s.length() - 1; i++) and the string s was empty, the condition being evaluated would not be 0 < -1 (therefore false, which is what you'd expect), but 0 < 63 (therefore true).

The same applies when you are simply comparing a signed value with an unsigned one; the signed value gets converted to unsigned before the comparison takes place. Suppose you had the following:

cout << (-1 < v.size() ? "Yes" : "No") << endl;
You might expect this to output "Yes" whatever the size of v, since -1 is obviously less than zero or any positive integer, but the reverse is the case. Because v.size() returns an unsigned int, the value -1 is converted to unsigned for the comparison. Since -1 is all ones in two's complement, this converts to a very large unsigned int.

So, the compiler warning is telling you to be careful. If all parts of the expression are always going to evaluate to zero or more, as in, for example for (int i = 0; i < s.length(); i++), you will not have a problem. But, if there is any chance that any part of the expression will yield a negative value (as a signed int), which will therefore be interpreted by the computer as a very large unsigned int, you might be in trouble.

The solution to this problem is to cast the unsigned value into a signed one, with static_cast<int>(unsigned value), as in this example:

for (int i = 0; i < static_cast<int>(s.length()) - 1; i++)
The effect of this is to type-convert the zero returned by s.length() from an unsigned int to a signed int, so that the evaluation of 0 - 1 yields -1, which is what you want.

There is an old-style form of cast, inherited from C, which looks like this:

for (int i = 0; i < (int)s.length() - 1; i++)
However, this is deprecated. That is to say, it is retained in the language in order to preserve compatibility with previous versions, but it is considered an undesirable feature and programmers are urged not to use it. The reasons for its undesirability are that ill-considered type casts are a cause of subtle bugs and they don't stand out clearly to someone reading the program. Typing out the static_cast rigmarole, by contrast, forces a moment of reflection, and it stands out on the page like a sore thumb.

One final point. Some students take to writing for (unsigned int i = 0; i < s.length() - 1; i++) and, because the compiler stops giving them a warning, they think they have dealt with the problem. But of course they haven't. The compiler stops giving its warning because now both i and s.length() - 1 are unsigned int, so they are not mixing the types. But, if s was empty, s.length() - 1 would still be a very large number. It's the type of s.length() that needs to be modified, not the type of i.


Notes on R. Mitton's lectures by S.P. Connolly, edited by R. Mitton, 2000/2008