C++ Week 14

Classes II

Enhancing the Date class

A private array of const Month

In our first version of the Date class, we had a couple of global arrays, one to hold the number of days in each month, the other to hold the names of the months. There are two unsatisfactory aspects of this arrangement. One is that we have parallel arrays, and the other is that our Date class depends on global structures defined outside it.

Arrays ax and bx are said to be parallel when there is some relationship one-to-one between the elements – ax[0] to bx[0], ax[1] to bx[1], ax[2] to bx[2] and so on. The reason that this arises is often, as here, that they are holding different pieces of information about the same items. It is an unsatisfactory arrangement because there is nothing in the program code to tie them together; the connection is too flimsy. Someone who was modifying the program without appreciating how the two arrays went together might insert a new element into the middle of ax without inserting a corresponding one into bx.

The solution is to create an array of objects that hold the mdays and mnames data for each month. Let us define a Month class to hold the data:

class Month 
{  public:
      char name[4];   // every month name has three letters, plus one for the null byte
      int days;
};

Note that the class is entirely public and has no methods. It serves only as a structure in which to store data of different types. It's not really a fully-fledged C++ class at all but rather the sort of simple aggregation of data that you would find in C (in C it would be called a struct). Let's call it a C-style class. We have chosen to use an array of char rather than a string to make the initialization easier.

It is possible to initialise an object of a C-style class using the same curly bracket system that you can use to initialize an array. So we can initialize a Month as follows:

Month  m = {"Dec", 31};

Note that the order of the arguments is the same as in the class definition. Any data fields left uninitialized in the curly brackets are initialized to zero, or an empty string in the case of strings or C-strings.

We now declare an array months of Month objects. Since we don't want the data to be changed, we declare it as an array of const Month:

const Month months[ ] = {{"", 0}, {"Jan", 31}, {"Feb", 28}, .... {"Dec", 31}};

We can address the second problem – of the arrays being global – by making the array of const Month private to the Date class. We saw in an earlier lecture that, if you want a const data item as a member of your class, it has to be declared as static. This means that every instance of Date shares the same months array, rather than each Date object having its own copy; the array belongs to the class rather than to objects of the class.

It is only for const data members of integral type that you can put the full definition of the const inside the class definition. For const data members of other types, you have to include a declaration inside the class and then provide the full definition outside the class. This is what it finally looks like:

class Date
{  public:
	....
   private:
	....
	static const Month months[ ]; // declare an array of Month objects in the private part of the Date class
	....
};
....
// Then put the definition of Date::months outside the class, using C-style initialization for a C-style array of C-style objects
const Month Date::months[ ] = {{"", 0}, {"Jan", 31}, {"Feb", 28}, 
  {"Mar", 31}, {"Apr", 30}, {"May", 31}, {"Jun", 30}, {"Jul", 31}, {"Aug", 31}, 
  {"Sep", 30}, {"Oct", 31}, {"Nov", 30}, {"Dec", 31}};

Overloading the minus operator for Dates

Suppose we want a - (minus) operator, which will return the difference (in days) between two dates. If, say, date1 had the value 5 Feb and date2 had the value 27 Jan, date1 - date2 would be 9. (date2 - date1 would be -9 – we're assuming that the two dates are in the same year.)

It would help a lot with this if we had a method that returned the day-number of a date (1 Jan = 1, 1 Feb = 32, 31 Dec = 365).

int daynum( ) const;  // prototype in the class definition
...
int Date::daynum( ) const
{  int num = 0;
   for (int i = 1; i < month; i++)
      num += months[i].days;
   num += day;
   return num;
}

Now it's trivial to overload the - (minus) operator to return the difference between two dates:

int operator- (const Date&) const;
...
int Date::operator- (const Date& dx) const
{  return daynum() - dx.daynum();    // or return this->daynum() - dx.daynum();
}

This operator, obviously, has to be public, but does the daynum() function also have to be public? Not necessarily. If you wanted to include it in the interface, you could, but you might decide to make it private, in which case other functions of the Date class could call it (as here) but it would not be accessible from outside.

Overloading the output operator

It would be useful to be able to write the contents of a Date to an output stream. One way of doing this is to call display(), but what about overloading the << operator so we can use code like cout << d1; ?

There are several difficulties in overloading the << operator. One is that the date is on the wrong side of the operator. Remember that, if we have overloaded some operator OP so that we can write things like x OP y, we are really calling a function called operatorOP and the call looks like x.operatorOP(y). So the overloaded function clearly has to belong to the thing on the left of the operator; the thing on the right is just an argument in the function call. So the overloaded << operator cannot be a member function of the Date class.

It is possible to define an overloaded operator function that is not a member function of any class, just a free-standing function. The free-standing version of a function that will be called as x OP y will begin with a line of this form:

return_type operatorOP(type_of_x  x, type_of_y   y)

The argument to the left of the operator in the call x OP y corresponds to the first of the parameters in the definition. So we can define a non-member function, that will take an output stream and a Date as its parameters.

Another problem is that the output operator takes as its argument an output stream of type ostream (e.g. cout) and a data item, and it returns the modified output stream. (This is why you can use code of the form cout << x << y; – the second << is sending y to the output stream returned by the first <<.) So it has to have the output stream both as a reference parameter and as a reference return type. Our prototype for the overloaded << operator is going to look like this:

ostream& operator<< (ostream&, const Date&);   // objects generally passed as const ref parameters rather than value parameters

Note that, since this overloaded operator function is not a member of any class, the first parameter corresponds to the argument on the left of the operator, and the second corresponds to the argument on the right.

The next hurdle to overcome is that, because the operator isn't a member function of Date, the operator has no access to Date's private data. We have to define accessor functions to expose day and month:

int get_day( ) const; // public
string get_month( ) const; // public
...
int Date::get_day( ) const 
{  return day;
}


string Date::get_month ( ) const
{  return months[month].name;
}

Now at last we can define the << operator:

ostream& operator<< (ostream& os, const Date& d)
{  os << d.get_day() << " " << d.get_month();
   return os;
}

This works, but you might take the view that we have been forced into creating accessor functions that we perhaps never intended should be part of the interface. There is an alternative solution. By declaring the << operator as a friend of the Date class we grant it access to the private section. It is still a non-member function but it now has the same access to the private section as a member function does. (We encountered friend in an earlier lecture where we made one class a friend of another, eg Stack a friend of the Node class, so that any function of the Stack class had privileged access to the private items of Node. Here we are using it to grant this privileged access just to one function.) The operator<< function is not a member of Date, so it is neither public nor private.

// In the class declaration
friend ostream& operator<< (ostream&, const Date&);

We can now do without the accessor functions, so that the operator definition becomes:

ostream& operator<< (ostream& os, const Date& d)
{  os << d.day << " " << d.months[d.month].name;   // or, if you prefer, Date::months[d.month].name
   return os;
}

As I noted when I introduced friend, some people feel strongly that the friend feature of C++ should never be used, on the grounds that one of the main reasons for using an object-oriented language is to gain the benefits of encapsulation and that the purpose of friend is precisely to circumvent the encapsulation.

Another variation, which avoids the need either for extra accessor functions or a friend function, is, first, to change the display procedure of Date so that it takes an ostream and returns it:

ostream& Date::display(ostream& os) const
{	os << day << " " << months[month].name;
	return os;
}

and then to write a trivial non-member function for the overloaded output operator whose sole purpose is to call the display procedure:

ostream& operator<<(ostream& os, const Date& d)
{	return d.display(os);
}

This is better because the functionality remains within the member functions. This non-member function does not require access to the private section of Date and so does not need to be a friend.

If you want, you can rename the display function to operator<< , just to underline the point that these two functions - the member and non-member versions - work closely together. The compiler will not be confused by them having the same name; they have different parameter lists, and one is a member function of the Date class, the other is not a member of any class.

ostream& Date::operator<<(ostream& os) const
{	os << day << " " << months[month].name;
	return os;
}

ostream& operator<<(ostream& os, const Date& d)  // This one is not a const function as it is not a member function
{	return d.operator<<(os);
}

One parameter or two?

Suppose we had an ordinary free-standing function (i.e. not a member of the Date class) called print_date which output a date. Its prototype might look like this:

ostream& print_date(ostream&, const Date&);

Assuming we had a Date variable called d1, we would call it like this:

print_date(cout, d1);

(You are familiar with calls like this, as in, say, getline(cin, s);)

Now, if we choose to give this function the special name operator<<, we can call it either in the same way as print_date:

operator<<(cout, d1);

or like an operator:

cout << d1;

So we provide two arguments in the call - an output stream and a Date - and they match up with the two parameters in the function definition.

But if we think back to one of the overloaded operators that we defined earlier, say the overloaded minus, the call has much the same form as cout << d1;:

d1 - d2

but, if we look at the prototype, we see that it has only one parameter:

int operator-(const Date&) const;

Since the minus operator requires two Dates, shouldn't we have two parameters? We can see why we have only one if we put the call into its alternative form:

d1.operator-(d2)

This reminds us that, in contrast to the overloaded output operator, the overloaded minus is a member function of the Date class. When we call this function, we will already be providing one argument - the Date on the left of the dot (sometimes called the implicit parameter); there will only be one argument in the actual argument list (inside the brackets), so we only need one explicit parameter.

Code of the Date class

A Safevec class

As an illustration of some aspects of classes that we haven't dealt with yet, we build a Safevec class. A Safevec is very like a vector of integers – its only data member is a vector<int> – but with a few variations. Its pop_back returns a value, and terminates the program if the Safevec is empty. Its size function returns an int, not an unsigned int. Its [ ] operator does array-bounds checking and terminates the program if the subscript is invalid.

class Safevec
{ private:
	vector<int> v;
  public:
	Safevec() { }
	Safevec(int len, int init = 0) : v(len, init) { }
	void push_back(int x) { v.push_back(x); }
	int pop_back();
	int size() const;
	int& operator[ ](int);
};

Note that the default constructor is empty. This is because the only data item is a vector, which is itself an object that has its own default constructor. The default constructor for the vector will be called automatically, which will provide a correctly initialized empty vector. We don't need to do anything.

There are various ways you could write the second constructor. Either of these would work:

Safevec::Safevec(int len, int init = 0)
{	vector<int> temp(len, init);
	v = temp;
}

Safevec::Safevec(int len, int init = 0)
{	v = vector<int> (len, init);
}

The version I've given in the class definition uses a field initializer list. After the parameter list, you put a colon and then you put the data items in a comma-separated list and provide the values for initializing them in parentheses. Since all the initializing has been done by the field initializer list, there is nothing to do in the body of the constructor, so it is empty. This is slightly more efficient than the other methods but mainly, at present, it is just a stylistic alternative. We will see a situation in a later lecture where this method is obligatory.

A version that looks plausible but is in fact incorrect is the following:

Safevec::Safevec(int len, int init = 0)
{	vector<int> v(len, init);		// No!!!
}

A constructor is a function, and, like other functions, it can have local variables. In this version, we have defined a local variable which happens to have the same name as the vector that is a data member of the class. But the Safevec's vector and this local vector are different objects. This local one will be created and then, being a local (therefore auto) variable, will promptly disappear when the function ends. The Safevec's v will be the same as it was before, i.e. empty.

Using a default argument for the init parameter allows the constructor to be called with either one argument or two, as for vectors. If we preferred, we could use a default argument for the first parameter also – Safevec(int len = 0, int init = 0) .... We could then dispense with the default constructor. (In fact we would have to dispense with it since, if we left it in, a call such as Safevec sv; could be a call to either the first or the second constructor; the compiler would complain of an ambiguous call.)

int Safevec::pop_back()
{	if (v.size() == 0)
	{	cerr << "Cannot pop empty Safevec" << endl;
		exit(1);
	}
	int	x = v[v.size() - 1];
	v.pop_back();
	return x;
}

int Safevec::size() const
{	return v.size();
}

v.size() returns an unsigned int, but, since Safevec::size() is an int function (it returns a signed int), the value returned by v.size() will be type-converted when Safevec's size() function returns it. The rules for type-conversion in such cases are the same as for assignment. (If, for example, you had return 5.432 in a function that returned an int, the value returned would be 5.)

int& Safevec::operator[ ](int sub)	// if the call was sv[5], sub would have the value 5
{	if (sub < 0 || sub >= v.size())
	{	cerr << "Subscript " << sub << " out of range" << endl;
		exit(1);
	}
	return v[sub];
}

Why are we returning an int&? Why not the more usual int? Suppose we had defined a Safevec thus: Safevec sv(10); If expressions such as sv[4] were only ever used in places where we simply wanted the value of sv[4], such as on the right-hand side of assignment statements, as in x = sv[4]; then a function that returned int would be adequate. But sv[4] is also going to be used in contexts where we need to know where sv[4] is (in order to put something in it), such as on the left-hand side of assignment statements, as in sv[4] = x;. For those uses, we need a reference, not just a value, hence the int&. A reference return type is very much like a reference parameter, except that we are passing something back instead of passing something in.

These two aspects of a variable are sometimes called its r-value - the value that it has on the right-hand side of an assignment, often called simply "the value" - and its l-value - the value that it has on the left-hand side of an assignment, i.e. its position or address. Given a variable's l-value, the computer can always find its r-value (it can go to that address and see what's there) but not vice-versa – a variable's r-value gives no clue to where it is located.

Note that it would be dangerous to return a local (auto) variable by reference. Local variables go out of existence when the function ends. The part of the program that received the returned reference would have a reference to something that no longer existed, or, to be more precise, to something whose continued existence could not be relied upon.

This overloaded [ ] function cannot be declared as a const function. Even it you don't actually use it to change the contents of the Safevec, you could do if you chose to, so it cannot be guaranteed to leave the object unaltered.

Notes on R. Mitton's lectures by S.P. Connolly, edited by R. Mitton, 2000-8