กก

Section 3.3
Pointers
cplusplus.com

We have already seen how variables are memory cells which we can access by their name. But these variables are stored in concrete places of the computer memory. For our programs the computer memory is only a succession of cells with a size of 1 byte (the minimum size for a datum) each one with a unique address.

A good simile for the computer memory can be a street of any city. On a street all houses are numbered consecutively with an unique identifier so if we talk about 27th of Sesame Street we will be able to find that place without loss, since it must be only one house with that number and, in addition, we know that that house will be between houses 26 and 28 (or between houses 25 and 29, depending on the city).

In the same way in which houses of a street are numbered, the operating system organizes the memory with unique and consecutive numbers, so that if we talk about location 1776 in the memory, we know that there is only one location with that address and also that is between addresses 1775 and 1777.

Direction (or dereference) operator (&).

At the moment in which we declare a variable this one must be stored in a concrete location in this succession of cells (the memory). We generally do not decide where the variable is to be placed; Fortunately that is something automatically done by the compiler and the operating system, but once the operating system has assigned an address we may be interested in knowing where, thus to be able to interact with it by using its address instead of its name.

This can be done by writting an ampersand(&), which literally means "direction of" just before the variable identifier. For example:

ted = &andy;
would assign to ted the address of variable andy, since when preceding the name of the variable andy with the ampersand (&) character we are no longer talking about the content of the variable, but about its address in the memory.

We are going to suppose that andy has been placed in the memory address 1776 and that we write the following:

andy = 25;
fred = andy;
ted = &andy;
the result will be the one which shows the following scheme:
We have assigned to fred the content of variable andy as we have done in many other occasions in previous sections of this tutorial, but to ted we have assigned the memory address in which the operating system stores the value of andy, that we have imagined that it was 1776 (it could be any address, I have just invented this one). The reason is that in the allocation of ted we have preceded andy with an ampersand (&) character.

The variable that stores the address of another variable (like ted in the previous example) is what we call a pointer. In C++ pointers have certain virtues and they are used very often. More ahead we will see how this type of variables are declared.

Reference or indirección operator (*)

Using a pointer we can access directly to the value stored in the variable pointed by it just by preceding the pointer identifier with the reference operator asterisk (*), that can be literally translated to "value pointed by". Thus, following with the values of the previous example, if we write:
beth = *ted;
(that we could read as: "beth equal to value pointed by ted") beth would take the value 25, since ted is 1776, and the value pointed by 1776 is 25.
You must clearly differenciate that ted stores 1776, but *ted (with an asterisk * before) refers to the value stored in the address 1776, that is 25. Notice the difference to include or not the reference asterisk (I have included an explanatory commentary of how each expression could be read):

beth = ted;   // beth equal to ted ( 1776 )
beth = *ted;  // beth equal to value pointed by ted ( 25 )

Operator of direction or dereference (&)
It is used as a variable prefix and can be translated as "address of", thus: &variable1 can be read as "address of variable1"

Operator of reference or indirection (*)
It indicates that what has to be evaluated is the content pointed by the expression considered as an address. It can be translated by "value pointed by".
* mypointer can be read as "value pointed by mypointer".

Once at this point, and following with the same example above:

andy = 25;
ted = &andy;
you should know clearly that the following expressions are true:
andy == 25
&andy == 1776
ted == 1776
*ted == 25
The first expression is clear considering that its assignation was andy=25;. Second uses the operator of direction or dereference (&) that returns the address of the variable andy, that is 1776. Third is quite obvious since second is true and the assignation of ted was ted = &andy;. The fourth expression uses the reference operator (*) that, as we have just seen, is equivalent to the value contained in the address pointed by ted, that is 25.

So, after all that, you may also infer that while the address pointed by ted does not change the following expression will also be true:

*ted == andy

Declaring variables of type pointer

Due to the ability of pointers to reference directly the value where they point to becomes necessary to specify which data type points to when declaring a pointer, since it is not the same to point to a char than an int or a float type.

Thus, the declaration of pointers follow this form:

type * pointer_name;
where type is the type of data pointed, not the type of the pointer itself. For example:
int * number;
char * character;
float * greatnumber;
they are three pointer declarations. Each one of them points to a different data type, but the three are pointers and in fact the three occupy the same (the size of a pointer depends on the operating system), but the data to which they point do not occupy the same nor are of the same type, one is int, another one char and the other one float.

I emphasize that this asterisk (*) that we put when declaring a pointer means only that: that it is a pointer, and does not have to be confused with the reference operator that we have seen a bit earlier and that is also written with an asterisk (*). They are simply two different tasks represented with the same sign.

// my first pointer
#include <iostream.h>

main ()
{
  int value1 = 5, value2 = 15;
  int * mypointer;

  mypointer = &value1;
  *mypointer = 10;
  mypointer = &value2;
  *mypointer = 20;
  cout << "value1==" << value1 << "/ value2==" << value2;
  return 0;
}
value1==10 / value2==20
Notice how the values of value1 and value2 have changed indirectly. First we have assigned to mypointer the address of value1 using the deference ampersand sign (&) and then we have assigned 10 to the value pointed by mypointer, that it is pointing to the address of value1, so we have modified value1 indirectly.

For that you can see that a pointer may take several different values during the same program we have repeated the process with value2 and the same pointer.

Here is an example a bit more complicated:
// more pointers
#include <iostream.h>

main ()
{
  int value1 = 5, value2 = 15;
  int *p1, *p2;

  p1 = &value1;     // p1 = address of value1
  p2 = &value2;     // p2 = address of value2
  *p1 = 10;         // value pointed by p1 = 10
  *p2 = *p1;        // value pointed by p2 = value pointed by p1
  p1 = p2;          // p1 = p2 (pointer assignation)
  *p1 = 20;         // value pointed by p1 = 20
  
  cout << "value1==" << value1 << "/ value2==" << value2;
  return 0;
}
value1==10 / value2==20

I have included as comments on each line how the code can be read: ampersand (&) as "direction of" and asterisk (*) as "value pointed by". Notice that there are expressions with pointers p1 and p2 with and without asterisk. The meaning to put or not a reference asterisk is much different: An asterisk (*) followed by the pointer refers to the site pointed by the pointer, whereas a pointer without an asterisk (*) refers to the value of the pointer itself, that is, the address where is pointing to.

Another thing that can call your attention is the line:

int *p1, *p2;
that declares the two pointers of the previous example putting an asterisk (*) for each pointer. The reason is that the type for all the declarations of the same line is int (and not int**) that is the same one that the declaration of types, therefore, because they are associative operators from the right, the asterisk are evaluated first than the type. We have talked about this in section 1.3: Operators, although it is enough that you know clearly that -unless you include parenthesis- you will have to put an asterisk (*) before each pointer you declare.

Pointers and arrays

The concept of array goes very bound to the one of pointer. In fact, the name of an array is equivalent to the address of its first element, like a pointer is equivalent to the address of the first element that it points to, so in fact they are the same thing. For example, supposing these two declarations:
int numbers [20];
int * p;
the following allocation would be valid:
p = numbers;
At this point p and numbers are equivalent and they have the same properties, with the only difference that we could assign another value to the pointer p whereas numbers will always point to the first of the 20 integer numbers of type int with which it was defined. So, unlike p, that is an ordinary variable pointer, numbers is a constant pointer (indeed that is an Array). Therefore, although the previous expression was valid, the following allocation will not:
numbers = p;
because numbers is an array (constant pointer), and no values can be assigned to constant identifiers.

Due to its character of variables all the expressions that include pointers in the following example are perfectly valid:

// more pointers
#include <iostream.h>

main ()
{
  int numbers[5];
  int * p;
  p = numbers;  *p = 10;
  p++;  *p = 20;
  p = &numbers[2];  *p = 30;
  p = numbers + 3;  *p = 40;
  p = numbers;  *(p+4) = 50;
  for (int n=0; n<5; n++)
    cout << numbers[n] << ", ";
  return 0;
}
10, 20, 30, 40, 50,

In chapter "Arrays" we used several times barcket signs [] in order to specify the index of the element from the Array that we wanted to refer. Well, the bracket signs operator [] are known as offset operators and they are equivalent to add the number within brackets to the address of a pointer. For example, both following expressions:

a[5] = 0;       // a [offset of 5] = 0
*(a+5) = 0;     // pointed by (a+5) = 0
are equivalent and valid either if a is a pointer or if it is an array.

Pointer initialization

When declaring pointers we may specify explicitly where we want that it points to,
int number;
int *tommy = &number;
this is equivalent to:
int number;
int *tommy;
tommy = &number;
When takes place a pointer assignation we are always assigning the address where it points to, never the value pointed. You must consider that at the moment of declaring a pointer, the asterisk (*) only indicates that it is a pointer, in no case indicates a reference operator asterisk (*). Remember, they are two different operators, although they are written with the same sign. Thus, we must take care of not confusing the previous with:
int number;
int *tommy;
*tommy = &number;
that would not have much sense.

Like in the case of arrays, the compiler allows the special case that we want to initialize the content at which the pointer points with constants at the moment of declaring the variable pointer:

char * terry = "hello";
in this case static storage is reserved for containing "hello" and a pointer to the first char of this memory block (that corresponds to 'h') is assigned to terry. If we imagine that "hello" is stored at addresses 1702 and following, the previous declaration could be outlined thus:
it is important to indicate that terry contains value 1702 and not 'h' nor "hello", although 1702 points to these last ones.

The pointer terry points to a string of characters and can be used exactly just as if it was an Array (remember that an array is just a constant pointer). For example, if our humor has changed and we want that in the content pointed by terry the 'o' be replaced by a '!' sign, we could do it by any of these two ways:

terry[4] = '!';
*(terry+4) = '!';
remember that to write terry[4] is just the same as to write *(terry+4), although the most usual expression is the first one. With any of those two expressions would happen something like this:

Arithmetic of pointers

To conduct arithmetical operations on pointers is a little different than to conduct them on other integer data types. To begin, only addition and subtraction operations are allowed to be conducted, the others make no sense in the world of pointers. But both addition and subtraction have a different behavior with pointers according to the size of the data type to which they point to.

When we saw the different data types that exist, we saw that some occupy more or less space than others in the memory. For example, integer numbers, char occupies 1 byte, short occupies 2 bytes and long occupies 4.

Let's suppose that we have 3 pointers:

char *mychar;
short *myshort;
long *mylong;
and that we know that they point to memory locations 1000, 2000 and 3000 respectively.

So if we write:

mychar++;
myshort++;
mylong++;
mychar, as you may expect, would contain the value 1001. Nevertheless, myshort would contain the value 2002, and mylong would contain 3004. The reason is that when adding one to a pointer we are doing that it points to the following element of the type with which it has been defined, and therefore the size in bytes of the type pointed is added to the pointer.
This is applicable both when adding and subtracting any number to a pointer. It would happen exactly the same if we write:
mychar = mychar + 1;
myshort = myshort + 1;
mylong = mylong + 1;
It may result important to warn you that both increase (++) and decrease (--) operators have a greater priority than the reference operator asterisk (*), therefore the following expressions may lead to confussion:
*p++;
*p++ = *q++;
The first it is equivalent to *(p++) and what does it do is to increase p (the address where it points to - not the value that contains).
The second, because both increase operators (++) are after the expressions to be evaluated and not before, first the value of *q is assigned to *p and then they are both q and p increased by one. It is equivalent to:
*p = *q;
p++;
q++;
Like always, I recommend you the use of parenthesis () in order to avoid unexpected results.

Pointers to pointers

C++ allows the use of pointers that point to pointers that these, as well, point to data. For that it is only needed to add an asterisk (*) for each level of reference:
char a;
char * b;
char ** c;
a = 'z';
b = &a;
c = &b;
this, supposing random memory locations 7230, 8092 and 10502, could be described thus:
The new thing in this example is variable c, which we can talk about in three different ways, each one of them would correspond to a different value:
c is a variable of type (char **) with a value of 10502
*c is a variable of type (char*) with a value of 8092
**c is a variable of type (char) with a value of'z'

void pointers

The type of pointer void is a special type of pointer. void pointers can point to any data type, from an integer value or a float to a string of characters. Its sole limitation is that the pointed data cannot be referenced directly, since its length is always undetermined, and for that reason we will always have to resort to type casting or assignations to turn our void pointer to a pointer of a concrete data type that we can refer.

One of its utilities is for passing generic parameters to a function:

// integer increaser
#include <iostream.h>

void increase (void* data, int type)
{
  switch (type)
  {
    case sizeof(char) : (*((char*)data))++; break;
    case sizeof(short): (*((short*)data))++; break;
    case sizeof(long) : (*((long*)data))++; break;
  }
}

main ()
{
  char a = 5;
  short b = 9;
  long c = 12;
  increase (&a,sizeof(a));
  increase (&b,sizeof(b));
  increase (&c,sizeof(c));
  cout << (int) a << ", " << b << ", " << c;
  return 0;
}
6, 10, 13
sizeof is an operator integrated in the C++ language that returns a constant value with the size in bytes of its parameter, so that, for example, sizeof(char) is 1, because char type is 1 byte long.

Pointers to functions

C++ allows to operate with pointers to functions. The greater utility of that is for passing a function as a parameter to another function, since these cannot be passed dereferenced. In order to declare a pointer to a function we must declare it like the prototype of the function but enclosing between parenthesis () the name of the function and inserting a pointer asterisk (*) before.

// pointer to functions
#include <iostream.h>

int addition (int a, int b)
{ return (a+b); }

int subtraction (int a, int b)
{ return (a-b); }

int (*minus)(int,int) = subtraction;

int operation (int x, int y, int (*functocall)(int,int))
{
  int g;
  g = (*functocall)(x,y);
  return (g);
}

int main ()
{
  int m,n;
  m = operation (7, 5, &addition);
  n = operation (20, m, minus);
  cout <<n;
  return 0;
}
8
In the example, minus is a global pointer to a function that has two parameters of type int, this one is immediately assigned to point to the function subtraction, all in a single line:
int (* minus)(int,int) = subtraction;

© The C++ Resources Network, 1999 - All rights reserved

Previous:
3-2. Strings of characters.

index
Next:
3-4. Dynamic memory.