C++ Week 9

Dynamic Data Structures I

So far in the course we have only been looking at variables and objects defined at the start of the program. Now we are going to be considering objects that are explicitly created (and destroyed) during the running of the program. When we declare an int we reserve a portion of memory (a few bytes) that will be used to store an item of integer data. This piece of memory has an address. We can declare an integer pointer, int* p, which can contain the address of an integer - where it is found in memory.

A pointer in C++ is always a pointer to a particular type of thing - a pointer to an int (int*), a pointer to a double (double*), a pointer to a string (string*), or whatever. The type of a pointer depends on the type of thing that it points at.

When you declare a pointer its value is initially undefined, that is to say, it holds some address but you don't know what address (you don't know where it's pointing), so at some stage you should assign an address to it. You do this by assigning it an already existing address or by using the keyword new as follows:

int z = 25;    // declare an integer and initialise it
int* p; // declare an integer pointer p = &z; // make p point to z by assigning it the address of z (&z)

Note that the int value is held in memory in the space assigned to z. p does not contain an integer data value, but the address of the integer z.

int* p;         // creates an integer pointer 
p = new int;   // makes it point at an uninitialised piece of memory that can hold integer data
Or we can do these two things in one line:
int* p = new int; // creates an integer pointer and makes it point at an uninitialised
                  // piece of memory that can hold integer data

This code:

The value stored in this portion of memory is as yet undefined. The only way to get at it is through the pointer, and if you lose the pointer, you can't get the memory back. More about this later.

So how do we put a value in the portion of memory that p points to? We apply the dereferencing operator * on p and then we can assign a value to it:

*p = 25;

If p is a pointer, *p is the thing it is pointing at.

We can treat *p in exactly the same way as we would any other integer variable, because that is what it is!

A note about declaring more than one pointer

Normally declarations of pointers place the asterisk next to the typename, as in

int* p;

Here we are declaring p as a pointer of type int*. However, when you declare more than one pointer, you must use a * operator on each subsequent variable in the declaration. If you don't you will declare the wrong type of variable:

int* p, *q; // p and q are both integer pointers
int* p, q;  // p is an integer pointer, but q is a plain old integer 

If you find yourself forgetting this, you can always declare pointers one per line.

Using pointers

Consider the situation created by the following code:

int* p = new int;    // declare an integer pointer p,
                     // create an integer in memory and make p point to it
*p = 25;             // assign the value 25 to the integer p points to
cout << *p << endl;  // print the value stored in the segment that p points to (= 25)
int* q;              // declare another integer pointer q 
q = p;               // make it point to the same place as p points to 
cout << *q;          // print the value stored in the segment that q points to (= 25)

Note that on its own, p = new int; declares an integer storage space in memory and makes pointer p point at it. It doesn't create p. You do that in the ordinary way by declaring a variable of type int*. The line int* q; only declares a pointer, assigning memory space for the address of an integer, but it doesn't point anywhere (it's uninitialised) and the statement doesn't assign any memory space for the integer data. It is also worth noting that q = p; means that q is pointing to the same place p is pointing at - it does not mean that q is pointing at p. (If we wanted q to be a pointer to an integer pointer we would have to declare it as type int**.)

We can write a slightly condensed version of the above code fragment as follows:

int* p = new int(25);  // initialize the new int when creating it
cout << *p << endl;
int* q = p;            // initialize the pointer q when defining it
cout << *q;

What is now the effect of assigning a new value to *q ?

*q = 34;

Since p and q are currently pointing at the same thing, any changes to *q will also affect *p.

But p and q do not have to point at the same thing for evermore. If there was another int, we could make q point at it, or we can create a new int for q to point at:

q = new int(56); // creates a new integer in memory,
                 // assigns the value 56 to it and then makes q point at it

Finally, we can create another integer = 78 in memory and make p point at it:

p = new int(78);

Unfortunately by moving p from 34 to 78 we have no way of getting back hold of 34. We have in effect 'let go of the balloon' which is now floating in memory, taking up space which we can't re-use. This effect is known as memory leakage, and the piece of memory containing the 34 is known as garbage. The runtime systems of some languages have garbage collection built in, but C++ doesn't and you have to be careful. If you leak too much memory, the system will run out of RAM and crash!

So what do you do when you want to make a pointer point to something else? Before you let go of the object you are pointing at, you have to delete it. e.g.

delete p;        // Release the space that p is pointing to so it can be re-used
                 // by the operating system 
p = new int(78); // make p point to a new variable

If you delete the object that is being pointed at, then, unless you immediately give the pointer a new value, it is a good idea also to set the pointer to NULL - undefined pointers are a common source of program bugs.

q = NULL;

Objects that contain pointers to other objects

If you create classes that contain pointers you can set their pointers to point to other objects that in turn point to other objects, creating long chains called data structures. For example, consider this simple Node class that contains a data field that can point to another Node object:

class Node
{  public:
      int n;
      Node* link;
};
This is a simple form of class, containing only data items, and all public. It could have all the things you usually find in a class, such as a private section and a constructor, but in this case we are choosing not to use those features. Objects of this class are simply aggregations of data, without any functionality. Most programming languages have the facility to define such data types, often called something like a record or, in C, a struct. Having defined a Node class, we can define a variable of type Node*:
int main( )
{   Node* p = new Node;      // declare a Node pointer, p, and assign space for a new 
                            // Node and make p point at it
}

This creates the following structure in memory:

Now, how can we initialise the data members of the Node *p? Code like

*p.n = 5;

looks a good bet, but unfortunately the rules of precedence mean that this is interpreted as:

*(p.n) = 5;

which makes no sense as we do not have an object called p with a pointer member called n. So we have to use brackets to force the order of evaluation:

(*p).n = 5;

In C++ this is a very common operation so there is an operator that performs the same action, ->:

p->n = 5;        // sets the integer member of *p to 5 
p->link = NULL;  // sets the pointer member of *p to NULL

ptr->mem picks out the member mem of the object pointed at by ptr.

Now let's create a new Node q and make p point at it:

Node*q = new Node;
q->n = 3;
q->link = NULL;
p->link = q;        // *p's link field now points to *q

This is the situation in memory:

By stringing many of these Node objects together we can create a structure called a singly-linked list. Click here to see the code for such a list. Singly-linked lists are chains of nodes, each containing a pointer to the next item in the list. Assuming the input A B C D to the above program, in memory we will have the following structure after the input, and the output will be D C B A, leaving q pointing at node D and p set to NULL (i.e. not pointing anywhere).

If we now perform the action cout << q->link->ch; on the above structure, we get the output C, the ch member of the node pointed at by the link member of the node that q is pointing at.

A circular linked list

If we add the following procedure to the end of the previous code, the output is ADCBADCB...in an unending loop. The pointer p moves to q (i.e. points to node D) and thence scrolls along to node A. Then it sets the link of node A to point at node D (making the list circular). Finally, it prints A, followed by the next node in the list (D) followed by C and so on...

p = q;
while (p->link != NULL)
   p = p->link;
p->link = q;
do
{  cout << p->ch;
   p = p->link;
}  while (p != NULL);

Click here for a full program listing.

The heap

We have in previous weeks seen three areas of memory that a C++ program makes use of. The first, obviously, is the area where the program's executable code resides. Then there is the part of memory reserved for static (as opposed to auto) variables; these are globals and local statics that remain throughout the life of the program. Then there is the system stack that holds parameters and local auto variables - the ones that are local to functions and which come into existence when a function begins and disappear when it finishes.

Dynamic data structures are supported by yet another area of memory, the heap. This is a large area of currently unused memory that the system can allocate to your program when you call the new function. The word "heap" is chosen to indicate that there is no particular relationship between one piece of allocated memory and another - they might be next to each other in memory or they might be miles apart. It's as though there is a big heap of memory and you just get given a chunk. When you do a delete, it just gets thrown back on the heap, and could be reallocated in response to another new.

The system keeps track of which chunks have been allocated and which are still on the heap, but, if you lose all your pointers to a chunk that has been allocated to your program, it does not return to the heap. Hence the problem of memory leak.

(There is also a data structure called a heap, notably used in a sorting algorithm called heapsort. This is a quite different use of the word "heap".)


Notes on R. Mitton's lectures by S.P. Connolly, edited by R. Mitton, 2000