[LUPG Home] [Tutorials] [Related Material] [Essays] [Project Ideas] [Send Comments]

v1.0

Unix And C/C++ Runtime Memory Management For Programmers

Table Of Contents

Why Learn About Memory Management?
Unix Memory Management
Memory Managers
C Runtime Memory Management
C++ Runtime Memory Management
Tools For C/C++ Memory Problems Resolving

Why Learn About Memory Management?

Many people claim that they know C or C++, and they even program in those languages. At the same time, when they stumble on a bug, they often don't truly understand its nature, and find it hard to fix (sometimes unable to fix it at all). Now, C and C++ coding requires a lot of usage of dynamic memory management, by allocating memory (malloc() or new), using it via pointers and references, and deallocating it (free() or delete).

Many bugs reveal themselves in the general form of 'memory corruption', or as programmers often say 'this structure contains garbage! how can this be?!?'. understanding the underlying memory management mechanisms supported by the operating system and by the programming language's runtime environment, will help you understand such bugs, fix them faster and better.

As an added bonus, understanding how memory management works, could help you write more efficient programs. This is especially important when writing programs that manipulate a lot of data, or that require every last bit of CPU power.

Unix Memory Management

Every process running on a Unix system gets memory management services from the operating system. This is true for most other operating systems as well. These services are given regardless of the language in which the program was written. Such services would include virtual memory management, paging services, memory protection and the like.

Machine Architectures - Memory, Cache, Registers

Basically, computers have two main locations for storing data - physical memory and registers. Memory is usually large, and is made of a linear set of cells, also called 'bytes' or 'words', depending on their size. Each memory cell is accessed using an address, and the memory does not have to be consecutive. On various architectures, certain parts of the memory are used to access devices (this is called memory-mapped I/O). other parts of memory might not even be mapped into any physical memory at all. In the old days, a programmer had to take care of this info in their program. Modern operating systems, however, hide this information from the user, as we'll see later.

Cache is a small chunk of memory, stored either directly in the CPU (level 1 cache), or on the motherboard (level 2 cache). Its purpose is to store a copy of 'recently used' parts of the main memory, in a location that can be accessed much faster, either because its in the CPU, or because its made of a faster (and more expensive) type of memory. Usually, the cache is hidden from our our programs by the hardware - a programmer usually need only worry about the cache when doing very low-level memory managed code inside the operating system's kernel.

Registers are storage cells stored directly inside the CPU with very fast access to its internals (the arithmetic-logic unit, for example). They can be accessed much faster than memory cells, and are often used to store data that is needed for a short calculation, such as contents of local variables in a function, or intermediate results of arithmetic calculations. the keyword 'register', when used when defining a local variable in C, can serve as a hint to the compiler to assign that variable to a register, rather than to a memory cell. Note that with current compilers and their optimizers, its often better to let the compiler decide which variables are better kept in registers.

Virtual Memory

In order to ease the programmer's life, as well as to allow a program to access more memory than might be physically installed in the machine, most modern operating systems support a virtual memory system. Virtual memory works by the operating system (in tight cooperation with the CPU) mapping virtual addresses used by a process, into physical addresses. When a process allocates a new block of memory, the operating system assigns this block to some section (called 'page') of physical memory, using a virtual memory translation table. Each access to a memory location by the process is translated to access to the matching physical memory page. In order to avoid slowing down the program, this translation is done directly by the CPU.

Memory Protection

As we have seen, the operating system uses a translation table to map virtual memory to physical memory. Taking this a further step, the system can use a different translation table for each process, thereby giving each process its own address space. This means that the same virtual memory address used by two different processes, will be mapped into two different physical memory addresses. This means that one process cannot access the contents of the memory used by the other process, and thus one process corrupting its memory won't interfere with the contents of the memory of any other process in the system. This feature is known as "memory protection", and is used now by most operating systems (including all Unix variants).

Run-time Management Of Virtual Memory

Some of the virtual memory sections might be mapped to no physical memory page. When a process tries to access a memory cell in such a section, the CPU identifies a page fault, and invokes an operating system routine that needs to handle this fault. Most operating systems use this feature to store part of the virtual memory sections on disk, rather than in RAM, thereby allowing the program to use an address space larger than the physical RAM of the machine. When a page fault occurs, the operating system loads the contents of the faulted virtual memory section from the disk, copies it into a free physical memory page, and updates the virtual memory translation table so the CPU will know to map the virtual memory section to the new physical RAM page.

During the search for a free page, it might be that no free page is found. In such case, the operating system takes the contents of a busy physical memory page, copies it to the hard disk, and uses this memory page to load the data of the desired virtual memory section. The virtual memory section that was previously mapped to this physical memory page, is now marked as paged out.

As you can see from this, the algorithm used to decide which memory page should be taken when there are no free pages, has to be efficient. With a bad algorithm, a process that accesses two virtual memory sections alternately, might cause the system to keep paging these sections in and out. Since access to the disk is much much slower than access to RAM, this will slow down the system tremendously. A common algorithm used to find candidates for paging out is called LRU - Least Recently Used. In this algorithm, the memory page that was used least recently, is the page that will have its contents paged out. This algorithm works well due to the locality principle - if a process accessed a given memory page, it is likely to access this same page or one near it on the next instruction.

SIGSEGV This! SIGBUS That!

We saw that some virtual memory sections aren't mapped to physical memory. While some of them were simply paged out, others were never allocated by the process. When a process runs, its virtual memory table is small. As it allocates more memory pages, the table grows. However, if the process tries to access a virtual memory address of a section it hasn't allocated yet, the operating system has no where to bring this page from. The designers of the Unix system decided that this situation indicates a program bug, and thus instead of making an automatic allocation of a memory page in such a case, they chose to send a signal to the process. This signal is a SEGV signal (or SIGSEGV), and its default signal handler prints out a "Segmentation violation - core dumped" message, and dumps the memory image of the process into a file named 'core' in the process's current directory.

Another way to cause a 'segmentation violation' is trying to access an illegal location of virtual memory. Because many invalid pointer problems occur with very low pointer values, the operating system does not allow a process to allocate a memory page for the virtual memory section beginning with the virtual address '0'. This is what causes programs to receive a SEGV signal when trying to dereference a NULL pointer (NULL on _most_ machine architectures is defined as '0').

What about a BUS (or SIGBUS) signal? this signal is sent to a program that tries to access a non-aligned pointer. For instance, on many machine architectures, access to 'long' (4 byte) numbers must be done using a memory address that divides by 4. Trying to access such an entity using an address that does not abide by this rule will cause the CPU to emit a trap. The operating system's kernel catches this trap, and then sends a BUS signal to the program. The default signal handler for this signal emits a "Bus error - core dumped" message, and dumps the memory contents to a 'core' file, much like the handler for the SEGV signal does.

Load On Demand

As we have seen, the efficiency of virtual memory is based on the locality principle. The same principle can be used when loading programs into memory. There is no point in loading all the executable file during application startup. Large parts of the code are likely not to be ever executed during the process's lifetime. Furthermore, loading all of the code at once will mean it will take more time until the process can start running.

Thus, what Unix systems do, is just allocate the page table entries, and mark the pages as "load on demand". When the process tries to access a virtual memory cell in a page that wasn't loaded yet, a page fault will occur, causing the operating system to load the contents of this specific page into memory. The code part of shared libraries is treated in a similar manner.

This feature, together with some properties of file management could explain the following phenomena: if you have a process executing program file 'foo', and you then compile a new version of 'foo' and copy it on top of the original 'foo' file - the process is likely to crash very quickly. The reason for this is that the process is likely to access a virtual memory cell of a page whose contents were not loaded yet. This will cause the system to load the page from the new binary file - which is most likely different from the original binary file. Thus, the program will be executing a random machine instruction, instead of the machine instruction it was supposed to perform, most likely resulting a process crash.

In order to avoid the above scenario, there are several options. One is killing all processes currently executing the program's binary, before updating it. A better method is to rename the old binary file to a new name (e.g. foo.old) and then copy the new binary to 'foo'. This will work, because the name of a file is relevant only when a process tries to open it. After the file was opened, the process accesses it via its disk location (I-node number) - which is different for the old and the new files. Actually, you may even erase the old binary file without renaming it - the file will be removed from the directory, but not from the disk. It will be removed from disk only when all processes keeping it open (e.g. processes executing this binary program file) will close it (or exit). Note that this is NOT true if the binary file is accessed via NFS, only if it is accessed on a local file system. NFS file accesses always use the file name, whether for open operations or for read operations.

Read-Only Memory

Some of the virtual memory space used by a process does not ever need to change during the process's life. This fact can be used to optimize handling of such memory pages by the paging code of the operating system. For example, executable memory pages are read-only pages, and are marked so by the system. Trying to write into them would fail, and generate a CPU trap which will be caught by the operating system, that will send a SEGV signal to the process.

When the system needs to page out read-only code pages of this sort, it doesn't even have to page them out to disk - it already has a copy of them inside the binary executable (or the shared library) they were taken from. Thus, paging these pages out just requires marking them as free.

The process can mark other memory pages as read-only too (see the man page for the mmap() system call, for example). This could be useful to make sure no part of the code tries to modify the contents of that memory - useful for trapping down some bugs.

Shared Memory

Another feature supported by Unix systems is shared memory. Using this feature, virtual memory sections of different processes are marked as shared, and thus point to the same physical memory pages. When one process changes the contents of this memory, the other processes immediatly see the new contents. Of course this means that access to this shared memory should be synchronized among the processes sharing it, using some locking mechanism.

Writable memory pages can be shared by processes using various system APIs, such as the System-V shared memory API (see the man page of shmget() and shmat() for details) or using the mmap() system call. Note that a page may be shared between more than 2 processes, and some of these processes might only require read-only access to these shared memory pages.

Read-only shared memory is easier to manage. One place where sharing read-only memory pages comes in very handy is sharing of executable code coming from program binary files and from shared libraries. The system automatically takes care of that - if a process tries to execute a program that is already loaded to memory due to being executed by another process, the memory pages holding the binary code will be mapped to the virtual address space of the new process, and thus overall memory usage of the system would not increase (other than the memory needed to hold the translation table for the new process - this table is not shared between processes). Note that dynamic data won't be shared by processes in this manner, since it is stored to writable memory pages. This is, ofcourse, the expected behaviour - you wouldn't want one process changing a variable's contents, affecting another process running the same program, would you?.

Memory Management For Related Processes

A special case of shared memory handling is handled by the system when a process creates a new child process using the fork() system call. This call creates a new process that contains an almost identical copy of the parent process's memory. The operating system makes a copy of the virtual memory translation table for the child, but current Unix implementations do not make a copy of the pages themselves. Instead, they mark all the pages (now shared by parent and child processes) as "copy-on-write". This means that as long as both processes only read from a given page, they access the same physical page. However, when one of these processes tries to write into such a shared memory page, the system notes the "copy-on-write" flag, and then creates a new copy of the page, and inserts that copy into the translation table of the writing process. Then, the write is actually performed on the new copy of the page.

This mechanism allows the fork() operation to work faster than if all of the contents of the virtual memory of the parent process would have been copied. One may ask why making this copy is useful in the first place. For the executable code pages - this is obvious, as the child process starts to run the same code that the parent process used. Regarding the data pages, this allows the parent process to pass some data to the child process that it would require during its run, such as file descriptors used for communications between the parent process and child process. This means that a call to fork() will be most efficient if done before the parent process allocated large ammounts of memory, i.e. as close to the parent process's own initialization.

Note that some operating systems support a version of the system call, named vfork() that is not supposed to even make a copy of the translation table. This system call is not very portable - see the relevant man page for more info.

Memory Managers

Memory managers are software libraries that perform memory allocation and de-allocation on behalf of the program. Every programming language comes with a runtime environment that includes at least one memory manager. Since the programming languages we deal with tend to be general purpose, the memory manager is also general purpose - it does not assume anything about the pattern of allocations, sizes of memory chunks allocated and the likes. Therefor, it is possible to find an allocation pattern that will cause a given memory allocation to perform poorly. Thus, having an idea regarding how a memory manager works could allow us to write programs that make more efficient use of memory, in terms of speed and of overall memory usage.

Memory Alignment

One property of all memory managers is that they return memory chunks which are aligned best for the architecture of the machine on which they are run. Alignment means that the memory chunks returned to the user begin on an address that divides by the size of a word for the CPU they are running on. This size is often of 4 bytes (for 32 bit architectures) or 8 bytes (for 64 bit architectures). Further, the size of returned chunks is also kept to a multiple of the word size. This ensures that the returned memory blocks can be used to store the largest data type the machine supports (e.g. a long integer).

The price of this mechanism is that often memory is wasted. For example, suppose that i have a machine with a word size of 4 bytes. Further, suppose that i want to store a string whose length is 5 bytes (including the terminating null character, if we're refering to a C language string). The memory manager will give me a memory chunk whose size is 8 bytes (2 words). I use 5 bytes, and 3 are wasted. Thus, general-purpose memory managers cause a lot of memory waste for programs that allocate many small memory chunks. We will discuss methods of overcoming this problem when we discuss specialized memory managers. We will also see how this affects issues with pointers and memory smearing (memory contents corrupting).

A Memory Manager's Algorithm

Various algorithms have been devised over the years for implementing efficient memory managers for general-purpose programming. We will present here a simple algorithm that is easy to follow, so you'll get a feel of how a memory manager works.

Linked List Of Memory Chunks

Our memory manager keeps a linked list of free memory chunks. Inside each chunk, the first few bytes are used for storing the chunk's size (including the space used by the memory manager to keep the chunk's size), and the address of the next chunk. This means that the size of a chunk must be at least large enough to store one number and one address (e.g. 8 bytes, on a 32-bit architecture). Here is how a single memory chunk in the list looks:

[Drawing of a single memory chunk]

And here is a linked list of memory chunks:

[Drawing of a linked list of memory chunks]

The memory manager is normally implemented as a library, which means it is linked into our process's executable binary. This allows the memory manager to work in the address space of our process, so it can indeed allocate memory that our process will be able to access. Also, because it works in virtual memory, the memory manager sees a large and consecutive address space. However, the linked list does not have to be consecutive, for reasons we'll soon see.

Allocating A Chunk

When the user asks to allocate a memory chunk of a given size, the memory manager scans the list until it finds the first memory chunk which is at least slightly larger than what the user asked for. This chunk is then removed from the linked list, and the manager logically splits it to three sections. In the first (small and fixed-size) section, it writes the size of the allocated chunk. The second section is the chunk whose address will be returned to the user. The third section contains the part of the chunk that goes beyond what the user required - this third section is turned into a new (free) chunk, and is added back into the linked list, instead of the original chunk. Note that if the original chunk had fit the exact size the user asked for, no third chunk will be left to be linked back to the list.

Thus, Assuming an unsigned long is used to represent a chunk's size, the chunk allocated has a size X plus the size of memory needed to store an unsigned long number (on a 32-bit architecture, an unsigned long occupies 4 bytes of memory).

Let's see how the above list looks after a user asked to allocate a chunk of memory with a size of 62 bytes. Let's assume the machine's word size is 4 bytes, so the manager will actually try to allocate a memory chunk of 64 bytes. in fact, since the manager needs to prepend the chunk's size in front of the chunk, it will actually allocate a 68 bytes block - 4 bytes to store the size, and 64 bytes to return to the user.

[Drawing of the above list after one allocation]

Freeing A Chunk

Freeing a chunk is a done by reversing the above process, with a few optimizations. The user gives the memory manager the address of the chunk it had previusly got from the manager. the manager looks at the 4 bytes right before this address for the size of the chunk. It then inserts the chunk back into its place on the linked list, checking if it can merge it with consecutive chunks already found on the list.

The reason for this merging is to avoid memory fragmentation - that is, to avoid turning one large memory chunk into several small ones after a set of allocation and free operations. This potential fragmentation would have caused allocation requests for large chunks, to be unable to use the freed chunks, if the manager hadn't noticed those chunks were actually consecutively placed in memory.

This chunk merging operation also implies that the linked list would better be stored in a sorted manner, so that it will be easy to find the neighbours of a chunk in the list.

In our specific case, if the user asks to free the memory chunk at address 0x2018, the memory manager will look at the 4 bytes before this chunk, seeing that the chunk's size is 68 bytes. Then, a scan of the linked list will show that this chunk belongs after the 1st chunk in the list. Calculating the chunk's end address will show that it is adjucent to the 3rd chunk in the list, and so the manager will merge these 2 chunks into a single chunk, and store this chunk on the linked list. The "next chunk" pointer in the previous list item will of course be updated to point to this merged chunk, and the list resulting will be the same as before the allocation we've shown above.

Note that in actuall cases, the order of memory allocation and free operations is arbitrary, so a freed chunk might be merged with a chunk it wasn't part of initially. It might also merge with two chunks - one on each side.

Algorithm's Efficiency

The usage of linked lists is not considered efficient in general. However, for short lists, this is far more efficient than storing sorted binary trees or similar structures, due to the overhead of the balancing operations performed on these trees. When dealing with memory allocation, lists tend to be short. Initially, the list will contain a single large memory chunk. The first allocation will split this chunk in two, returning the first part to the user, and leaving a single (smaller) chunk on the list. The same goes for the next allocations. When the user begins to free memroy, chunks return to the pool, often causing some fragmentation. however, these same chunks are likely to be used again on the next allocation, reducing the size of the list again. Thus, for common patterns of memory allocation, a linked list would turn out to be quite efficient both in speed, and in use of memory.

More Info On Memory Management Algorithms

The issue of memory management algorithms has been extensively researched over the years, and a lot of information about it can be found on the Web. For an extensive coverage of memory management, look at the following 1995 postscript article, that provides a survey of dynamic storage allocation techniques, from the univerisity of Texas, USA.

Interaction With Unix's Memory Management

We have claimed that the memory manager begins with one chunk on its list of free memory chunks. This chunk has to be taken from some place - this is the role of the operating system's virtual memory manager. As you remember, the memory the process accesses is actually virtual memory. Also, most virtual memory sections are not mapped anywhere. This is done for efficiency - when a process's virtual address space is of a size of several giga-bytes, it will be wasteful to allocate a map to handle this ammount of virtual memory, especially since most processes use a much smaller ammount of memory.

The way the operating system handles this situation is by not allocating any memory for the process, unless it asks for it. Of course, the prorgam's binary and shared library sizes can be known in advance, but memory used to store data is of an arbitrary size. When the process requires more memory to store data pages, it uses the brk() system call (or sbrk()) to ask the system to enlarge its allocated address space. The system than adds entries to the translation map for this process, and the pages themselves will be then mapped into physical memory when they are accessed.

Actually, the system call interface is hardly ever used by programmers. The memory manager of the runtime environment of our programming language is the one that uses this interface. Thus, whenever it runs out of free memory chunks on its list, it'll call the brk() system call to allocate more memory pages, and add the new block to its chunks list. The ammount of memory it will ask the system to allocate can vary. Asking for too small sizes will cause many calls to brk(), which is time consuming. Allocating a too large block will take time for the system to update the virtual translation map for the process.

Note: there is usually no system call that can be used to return unused memory back to the operating system. when you free memory, the memory manager cannot return this memory back to the operating system. If this freed memory is never used again, it will eventually be paged out to disk - but won't be freed until the process exits. Thus, the ammount of virtual memory used by a process always increases - never decreases. That is, unless...:

Note 2: the above rule has an exeption - shared memory, and mmap()-ed memory. Shared memory is counted as part of the memory space used by the process. Thus, when the process attached to a shared memory chunk, its 'accounted' virtual address space size is increased by the ammount of space in the shared memory page. When the process releases a shared memory chunk, the process's 'accounted' virtual address space size is decreased. The same goes for memory mapped using the mmap() system call.

Specialized Memory Managers

Since general memry managers are... well - general, it makes sense to implement specialized memory managers in cases where the general memory manager causes too much overhead (in CPU usage or of memory used due to fragmentation).

A good example is when a program needs to allocate many (hundreads of thousands, millions) memory chunks all of the same size. This can be done by allocating a large block (whose size is a multiple of the chunk's size) and slicing it into a set of such chunks. This allows both for faster allocation, as well as for less memory usage overhead - since we don't need to keep the size of each chunk when we allocate it - they are all of the same size, after all.

Memory Manager's Interaction With Program's Code

As we have seen, the memory manager keeps its own data structures in the address space of the process for which it works. This means that if the process overwrites memory out of bounds (e.g. referencing a location right before an array it has allocated using the manager) - the data structures of the memory manager itself may be corrupted. This would explain cases where a program we write crashes, and the stack shows it was in a middle of a call to the memery manager's functions (e.g. malloc() or free() in C, new or delete in C++).

Note that these kinds of bugs are not always easy to track - the memory corruption might have occured in a completely different part of the code, and just looking at the functions on the stack and the code they execute often won't help locating the cause of the problem.

The cause of such a problem could be that our program had overwritten a location that had stored the size of a chunk, and thus when it was freed, the memory manager overtook a larger chunk than had originally been allocated. The manager later allocated parts of this chunk for other purposes, and thus we get the same chunk of memory used by different parts of our code for different purposes. Or a bug in our code might cause the list of free memory chunks to be corrupted, causing one of its pointers to contain an invalid value (e.g. pointing outside the process's address space).

C Runtime Memory Management

The C programming language and its runtime environment serve as the basic language used on Unix systems (if we ignore assembly language for the time being). The runtime environment defines not only how memory is allocated and freed, but also how the process's different pieces of information are layed out in memory. Also, being a langauge that works on a low level, it allows the programmer a lot of control on how memory is used. This means a lot of power - together with responsibility.

Program Memory Segments

A C program is layed out in memory in several seperate segments - code segment, stack segment and data segment. These segments are layed out inside the virtual address space of the process.

The code segment holds the executable code of the program (both coming from the program's binary itself, and any shared libraries it loads). All memory pages of the code segment are marked as read-only, and are shared with any other process using the same program file and/or shared library files.

Note: not all pages are shared in this way, due to 'relocation' issues. These are cases when a part of the program needs to know the address of another part, but the exact address will only be known at run-time. This address might be different for different processes (since they load a different set of shared libraries, for example), and thus cannot be written in a shared page. The dynamic loader (which is responsible for handling such situations, among other things) uses "indirect jumps" - it stores non-shared pages that contain these varied addresses in them, and the program's code is compiled in a way that it will read the addresses from these non-shared pages. If this was not done, complete pages of executeable code will need to be copied, just to change one or two locations holding such addresses. For a clearer and more complete explanation about relocations and how exactly they are handled, please refer to Ulrich Drepper's home page, and in particular to his How to write shared libraries paper.

The stack segment is used to hold the contents of the stack of the process. The stack size is defined at process startup, and cannot be increased while the process runs. For a multi-threaded process, several stacks will exist in the stack segment.

The data segment occupies the space from the stack segment, to the last address allocated by the process using the brk() system call. This segment can grow as the process runs and needs more virtual memory.

The Stack And Local Variables

The stack is used by the process to store the chain of functions which are currently in the middle of execution. Each function call causes the process to add an execution frame to the top of the stack. This execution frame would contain the contents of CPU registers as they were before the function call was made, the return address (i.e. where in the code this function was invoked, so we can return there when the function returns), the parameters passed to the function and all local variables of the function. Thus, when we have fucntion 'main' calling function 'a' which calls function 'b', we will have 3 frames on the stack.

The 'allocation' of a stack frame does not require performing real memory allocation - as the space for the stack was reserved during process startup. Instead, this is done by merely marking a part of the stack as containing the frame, and copying data to that frame. Note that local variables are "allocated" by simply advancing the stack pointer beyond their location on the stack, so allocating local variables takes a fixed ammount of time, no matter their size. When a function returns - its stack frame is freed by simply modifying the stack pointer to point below its frame.

The local variables are not initialized - they just contain the values that were accidentally placed in the memory locations these variables occupy. You may consider a situation in which a function was called, its variables used and given some values. Later the function returned, and its frame was released. Then, another function was called. the stack frame of this new function is located in the same place in memory as the frame of the former function, so the new function's local variables will get the values that were left there by the local variables of the former function. This can explain why the values of unintialized local variables are often neither 0, nor look like total garbage.

Since all local variables are stored in consecutive memory on the stack, if we take a pointer to such a variable, and later on write right above or right below this pointer, we'll be modifying the contents of another local variable, or even that of the function's return address, if we're "lucky". For example, consider the following code:


int foo()
{
    int numbers[2];
    int j;
    
    j = 2;
    printf("j - %d\n", j);
    numbers[2] = 3;
    printf("j - %d\n", j);
}

During execution of this function, we first assign 2 to 'j', and thus the first print command will show "j - 2". Then, we try to assign a value to 'numbers[2]'. However, the 'numbers' array only has 2 cells - 0 and 1. Writing into subscript '2' of this array will cause us to write just beyond the array (which is fully located on the stack). The variable 'j' just happens to be stored in that location in memory, and thus, the value '3' will be actually assigned to 'j'. Our second print command will thus show "j - 3". Note that this assumes that the variables are stored in memory in the same order as they were declared inside the function's code. With some compilers, this might not be the case, and the out-of-range assignment might overwrite a different variable, or a part of the stack that does not hold variables, leading to other unexpected results.

Note: local variables (as well as function parameters) might be stored in registers, rather than on the stack. This could be either because we used the 'register' keyword when declaring these variables, or because the compiler's optimization chose to place a variable in a register. Ofcourse, such variables cannot be over-written by stack overflows.

The Essence Of Pointers

So what realy is a pointer? As we are taught in any C programming course, a pointer is an entity that holds the address of another entity - a variable, or some memory address. Normally, we can treat a pointer as a variable pointing to another variable. This is important - since it implies that the pointer, by itself, is a piece of data stored in memory. The only thing differentiating a pointer from a simple number, is how we treat the pointer's value. Thus, a pointer is a memory cell (or a register) which contains a number, that we happen to interpret as an address of another memory cell.

Lets view a small function that uses pointers, and see how its memory is layed out:


void
hello_pointer()
{
    char data[8] = "abcdefgh";
    char* p = data;
    int i;

    /* 1 */
    for (i=0; i<9; i++, p++) {
	*p = 'a';
    }
    /* 2 */
    *p = 'b';
}

Here is the layout of the stack during this function call, at the location marked with '/* 1 */':
[Drawing of stack layout in function, line marked as '/* 1 */']

As you can see, 'data' occupies 8 characters (memory cells). 'p' occupies 4 memory cells (assuming a pointer is 4 bytes long - which is true for 32 bit architectures, such as on Intel/AMD 80386 and compatible CPUs). We ignore 'i' for now. As you can see, 'p' contains the address of 'data[0]' (that is, the first byte of the array 'data'), before the loop runs.

While we execute the loop above, we modify the contents of the memory cell into which 'p' points, to contain the ascii code of 'a', and then advance 'p' to point to the next byte. This is all nice and dandy up to the 8th iteration, in which 'i' equals '7', and 'p' contains the address of 'data[7]':
[Drawing of stack layout in function, during 8th iteration]

Then things begin to get ugly. On the next iteration, 'p' is advanced again, and points to 'data[8]'. But wait - the array only contains 8 bytes, which are indexed 0 through 7. Pointing to 'data[8]' means 'pointing one byte after the end of the 'data' array. And what do we find there? that's right - the first byte of 'p' itself. Thus, after the next iteration, we will have the following contents on the stack:
[Drawing of stack layout in function, during 9th iteration]

As you can see, 'p' actually pointed to the location of memory in which its own contents resides, and thus, writing into '*p', we modified 'p' itself to contain some random address (0x61, in hex, is the ascii code of 'a'). Read this line again please, to be sure you understood it.

When we will try to execute the command after the loop (*p = 'b'), we will smear some random location in memory (possibly even unmapped memory), leading to varius random effects (either smearing another memory cell, or, in case the address now in 'p' points to an unmapped memory page, causing the program to crash with a SEGV signal).

Now, what would have happened if the source code of the previous example looked like this:


void
hello_pointer()
{
    char data[10] = "abcdefghij";
    char* p = data;
    int i;

    /* 1 */
    for (i=0; i<11; i++, p++) {
	*p = 'a';
    }
    /* 2 */
    *p = 'b';
}

Lets see the memory layout of the stack in this case:
[Drawing of stack layout in function, line marked as '/* 2 */']

As you can see, the 'data' array here occupies 10 bytes, and since 'p' has to be aligned to a 4-byte boundary, there are 2 unused bytes between 'data[9]' and 'p' (the contents of these bytes is, ofcourse, undefined, since there is no need to initialize them). This time, our 'off-by-1' error in the loop will cause us to override the contents of an otherwise unused memory cell. Thus, we will not even notice this kind of memory smear, until we sometime in the future modify the size of the 'data' array, or add a 'short' (2 bytes) variable between 'data' and 'p'.

Pointers And Assembly Language

Note: This section will be readable if you know assembly. If not - skip to the heap management section.

As you know, assembly (and ofcourse machine code) uses several addressing modes to access operands of commands. Direct addressing assumes that an operand is found in a register. Indirect addressing assumes that the register contains the address of a memory cell (or memory word) that contains the data. As you can see, indirect addressing treats the register as a pointer. Memory based indirect addressing assumes that the command contains the address of a memory cell, whose contents contains the address of another memory cell, which contains the operand. This kind of addressing can be used to handle pointers - after all, pointers are just variables, and variables are just addresses of memory cells.

To illustrate this point, lets see a small fragment of C source code, and how it translates into assembly code. We will compile this code using 'gcc -S <filename>' to get the assembly code. First, here is the C code:


int
main()
{
    char c;
    char* p = &c;

    *p = 'b';

    return;
}

And here is the resulting assembly code (as generated on linux, in the AT&T assembly format). Just a few notes about this assembly variant:

In 2 operand commands, the first operand is the 'source', the second is the 'target'.
Register names are prefixed with '%', like '%eax' and '%ebp'.
Indirect addressing is denoted by surrounding a register with pharanteses, as in '(%eax)'. To add a displacement (an offset), this is prefixed with a number, as in '4(%eax)'.
Numeric arguments are prefixed with '$', like '$98' to denote the numeric value '98'.
Register '%esp' is the stack pointer.

Regarsing the stack, the variables on it are stored 'backwards', and the stack grows into lower memory. This means that the variables are stored in the following manner (note - '%esp' is where the stack pointer points _before_ allocating the variables on the stack):


bottom of stack                padding            %esp
    |                        .____|_____.           |
   \|/                       |    |     |          \|/
    v                       \|/  \|/   \|/          v
                             v    v     v      

   ----------------------------------------------- ----
  |  p  |  p  |  p  |  p  |     |     |     |  c  |    | -> towards higher memory
   ----------------------------------------------- ----

    .
   /|\
    |
-8(%esp)

Now lets see the code itself:


# first come some assembly mambo-jambo - not interesting for demonstrating
# pointers.
        .file   "pointer.c"
        .version        "01.01"
gcc2_compiled.:
.text
        .align 4
.globl main
        .type    main,@function
# next comes the 'main' function.
main:
        pushl %ebp        # the following lines initialize register '%ebp'
        movl %esp,%ebp    # to point to the top of the stack frame, where
                          # variables 'c' and 'p' are stored.
        subl $8,%esp      # then we update '%esp' (the stack pointer) to
                          # the bottom of the stack.

	leal -1(%ebp),%edx # we want 'p' to contain the address of 'c'.
                           # since 'c' is a single char, we need to take an
                           # offset of '1' from the top of the stack frame.
                           # we store this address in register '%edx'.
                           # leal = Load Effective Address (Long).

	movl %edx,-8(%ebp) # then we move that address into the location
                           # on the stack where 'p' is stored.

        movl -8(%ebp),%eax # 'p' is stored on the stack, at location
                           # '%ebp' minus 8 bytes. we use indirect
                           # addressing, in order to fetch the contents
                           # of the memory word ('l') whose address is
                           # the contents of '%ebp', minus 4.
                           # so now we have the contents of 'p' (which is
                           # the address of 'c') in register '%eax'.

        movb $98,(%eax)    # and then we copy the value '98' (ascii code of
                           # 'b') into where '%eax' points.

        xorl %eax,%eax     # that's it. from here, its the code used to
                           # return '0' from the function.
        jmp .L1
        .p2align 4,,7
.L1:
        leave
        ret
# and some more assembly mambo-jambo.
.Lfe1:
        .size    main,.Lfe1-main
        .ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"

If you don't get it all at first glance - stare at the code a while longer. It will become clear with enough staring �� ;)

Heap Management With `malloc()` and `free()`

When programming in C, we normally use the standard memory allocator that comes with the C library, to allocate memory from the heap (the data segment). This memory allocator defines functions such as malloc() and free(), for memory allocation handling.

Usage Of `malloc()`

When we want to allocate a chunk of memory from the heap, we use the malloc() function. This function accepts a numeric parameter denoting the size of the memory chunk we want, and returns a pointer to a memory chunk with the following properties:

The size of the returned chunk is usually enlarged to fit some 'rounded' size for which the memory manager is optimized. For instance, if we allocate an array of 10 chars (10 bytes), the manager will most likely pad the chunk so it will contain a number of complete words, thus returning a pointer to a 12 bytes chunk (three 4-byte words). We should NOT rely on this behaviour, and not try to use more bytes than we asked for.
The starting address of the returned chunk is aligned based on the size of the machine's "native" data, or rather to the size of the largest 'native' C language object (usually a 'long long' or 'double'). This is required, since the malloc() function does not know what we wish to do with this memory chunk (it returns a char pointer), so it must returned memory that may be used to store any type of object.

The memory chunk returned by malloc() is assigned to a pointer variable, and cast to the type of object this pointer points to. since this is just a general memory chunk, we may later cast it into other types without a problem - as long as the chunk is long enough to hold an object of that type. Here is an example:


  char* p_c;
  int* p_i;

  p_c = malloc(8);
  strcpy(p_C, "bla");
  printf("%s\n", p_c);
  p_i = (int*)p_c;
  (*p_i) = 4;
  printf("%d\n", *p_i);

  printf("p_i is '%p', p_c is '%p'\n", p_i, p_c);

We allocated memory for use as a char pointer (a C "string"), and used it like that. We then cast this memory into an int pointer, assigned a value to it, and printed its contents as if it was an integer. Finally, with the last printf() call, we printed out the address that p_i and p_c contain, to show that both point to the same memory chunk (note - the use of '%p' in the format of printf() tells printf() to print the address that the variable contains, rather than the value stored in that address).

Usage Of `free()`

When we want to free a memory chunk previously allocated by malloc(), we use the free function. This function accepts a char pointer to a previously allocated memory chunk, and frees it - that is, adds it to the list of free memory chunks, that may be re-allocated. Several notes about free():

The size of the chunk was stored by malloc() previously in its memory map, and that is how free() knows how many bytes to free.
The freed memory is not being cleared or erased in any manner. This is why accessing memory that was just freed often does not cause a crash - any data in it is still the same as before calling free().
The free() function cannot nullify pointers to the given memory chunk that might still exist in our program. After we call free(), it is up to us (the programmers) not to try and dereference pointers that still point to that memory chunk. Such pointers are known as 'dangling pointers' - they point to memory that was already freed, and thus they should NOT be dereferenced again, unless they are assigned the address of a different (not-freed) memory chunk.

As you can see, free() only marks the memory chunk as free - there is no enforcement of this freeing operation. Accessing memory chunks that were previously freed is the cause of many memory errors for novices and experienced programmers. A good practice is that always nullify a pointer that was just freed, as in:


  char* p = malloc(10);
  strcpy(p, "hello");
  printf("%s\n", p);
  free(p);
  p = NULL;

This setting of the pointer to NULL wastes a little time when the program runs, but can save a lot of time when debugging code. In most non-time-critical applications, the time wasted during runtime is negligable relative to the time saved during application development and debugging.

Freeing The Same Chunk Twice

One might ask - what if i accidentally free a memory chunk twice? this kind of bug could occur, for example, if we free a data point but don't nullify it, and during cleanup later, we see its not NULL, and try to free it again.

To understand what this kind of error might cause, we should remember how the memory manager normally works. Often, it stores the size of the allocated chunk right before the chunk itself in memory. If we freed the memory, this memory chunk might have been allocated again by another malloc() request, and thus this double-free will actually free the wrong memory chunk - causing us to have a dangling pointer somewhere else in our application. Such bugs tend to show themselves much later than the place in the code where they occured. Sometimes we don't see them at all, but they still lurk around, waiting for an opportunity to rear their ugly heads.

Another problem that might occure, is that this double-free will be done after the freed chunk was merged together with neighbouring free chunks to form a larger free chunk, and then the larger chunk was re-allocated. In such a case, when we try to free() our chunk for the 2nd time, we'll actually free only part of the memory chunk that the application is currently using. This will cause even more unexpected problems.

C++ Runtime Memory Management

Note: This section will be readable if you know C++. If not - skip to the tools for C/C++ memory problems resolving section.

The C++ runtime environment contains its own memory manager, that does not necessarily have to be the same one used for the C runtime. However, in many cases, the same memory manager is used for both. For example, the GNU C++ compiler's new operator actually invokes the C runtime malloc() function. However, the C++ new operator does more than just invoking malloc, as we'll see now.

Heap Management with `new` and `delete`

Memory management in C++ is done using the new and delete operators, and their big cousins, new[] and delete[].

Usage Of `new`

In order to allocate memory in a C++ program, we use the new operator. This operator has two modes of operation. One mode of operation is when allocating memory for a native type (char, int, float...). The other mode of operation is when allocating memory for an object instance.

When allocating memory for a native type, new allocates a pointer of the proper type, and returns it. Because new is an operator, not a function, it returns a pointer to the proper type, rather than a point to char, as does the malloc() C function. In addition, new may be instructed to initialized the allocated memory, with a given value, suiteable for the allocated type. Here is an example of using new with native types:


  // allocate an 'int', and store its address in 'p_i'.
  int* p_i = new int;

  // allocate an 'int' initialized to '3', and store its address in 'p_i2'.
  int* p_i2 = new int(3);

As you can see, new may accept either just a type, or a type and a value. If it accepts a type, it just allocates the memory for that type and returns it. If it accepts both a type and a value, it allocates the memory for the type, initializes it to contain the given value, and returns a pointer to the allocated memory chunk.

When new is instructed to allocate memory for an object instance, it allocates the memory, and then invokes one of the constructors of the object's class, to initialize the instance. The exact constructor chosen is the one matching the parameters passed to new. Lets see an example:


  // lets define a class with two constructors.
  class A {
    public:
      A()   { i = 0; }
      A(int num)   { i = num; }
      ~A();
    private:
      int i;
  };

  // and now lets initialize a few instances:

  // create an instance on the stack. this does NOT invoke new, but
  //    does invoke the default constructor - A() .
  A a1;

  // create an instance on the heap. This invokes new, which further
  //    invokes the default constructor - A() .
  A* p_a2 = new A;

  // create another instance on the heap. This invokes new, which further
  //    invokes the constructor that gets one int parameter - A(int num) .
  A* p_a3 = new A(5);

As you can see, there is strong seperation between allocation of memory, and invoking the object's constructor. When allocating an instance on the stack (a local variable) we don't need to use new, since the memory of the stack is already allocated - we only need to call the object's constructor. When we allocate an instance on the heap, we call new to do that, and then the matching constructor is also invoked.

Usage Of `delete`

When we want to free memory (and object instances) that were allocated using the new operator, we use the delete operator. This operator has two modes of operation, similar to the new operator. For native types, delete just freed the memory previously allocated by new. For object instances, it first invokes the destructor for the object, and then frees the memory itself. Lets see a few examples:


  // lets first allocate an integer...
  int* p_i = new int;
  // ...and then immediately free it. Note that the type of object
  // freed could determine the size of memory to free. However, the ammount of
  // memory was more likely stored when allocating the object, as was the case
  // when allocating memory with malloc().
  delete p_i;

  // any atempt to access '*p_i' now, is a dereference of an invalid pointer.
  // lets hope no one does that ;)

  // now, lets define a class A.
  // This class contains a pointer to an int, which is allocated in the destructor,
  // and freed in the destructor.
  class A {
    public:
      A()   { p_i = new int(0); }
      ~A()  { delete p_i; }
    private:
      int* p_i;
  };

  // lets allocate an object of class A.
  A* p_a = new A;

  // Now lets free this object. The destructor is first invoked, which will
  // free the 'p_i' member, and only then the memory of the object itself
  // is freed.
  delete p_a;

Note: when delete is invoked for an object, te compiler needs to know which destructor to invoke (if that object is of a class that inherits from another class, for example). Choosing the destructor is done using the same method of choosing which function to activate on an object, when dealing with virtual Vs. non-virtual functions. For example, lets suppose that we have two classes, Parent and Child, where Child inherits from Parent. Lets see two code chunks that allocate a Child instance, and then frees it.


  // Define class Parent.
  class Parent {
    public:
      Parent() { p_parent = new int(0); }
      ~Parent { delete p_parent; }
    private:
      int* p_parent;
  };

  // Define class Child.
  class Child : public Parent {
    public:
      Child() { p_child = new int(1); }
      ~Child() { delete p_child; }
    private:
      int* p_child;
  };

  // Lets allocate an object of class Child.
  Child* p_obj1 = new Child;

  // Lets delete the object now. Since the pointer is of class 'Child*',
  // the destructor invoked is '~Child()' - which is what we expect to
  // occur. Ofcourse, after '~Child()' is invoked, the parent class's
  // destrcutor, '~Parent()' is also invoked - the compiler handles the
  // code to do that. Thus, both int pointed to by 'p_child' and the int
  // pointer to by 'p_parent' will be freed, in this order.
  delete p_obj1;

  // Now, lets allocate an object of class Child. Due to polymorphism rules,
  // we may store its pointer in a 'Parent*' pointer:
  Parent* p_obj2 = new Child;

  // Lets delete the object now. Because we are freed it via a pointer of
  // type 'Parent*', the compiler can not know that the object is in fact
  // a 'Child' object, and thus, will invoke the '~Parent()' destructor,
  // instead of invoking the '~Child()' destructor. The '~Child()' destructor
  // won't be invoked, and thus the int pointer to by 'p_child' will not
  // be deleted, and this memory will leak.
  delete p_obj2;

Lets now see how to fix the Parent class definition, to avoid the memory leak problem of the second case:


  // Define class Parent. This time, define the destructor is 'virtual'.
  class Parent {
    public:
      Parent() { p_parent = new int(0); }
      virtual ~Parent { delete p_parent; }
    private:
      int* p_parent;
  };

  // Define class Child. The destructor is made virtual automatically, because
  // Child inherits from a class with a virtual destructor. We could define
  // the destructor just as '~Child() {...}', but for clarity, we define it
  // as 'virtual ~Child() {...}'.
  class Child : public Parent {
    public:
      Child() { p_child = new int(1); }
      virtual ~Child() { delete p_child; }
    private:
      int* p_child;
  };

  // Lets allocate an object of class Child.
  Child* p_obj1 = new Child;

  // Lets delete the object now. Since the pointer is of class 'Child*',
  // the destructor invoked is '~Child()' - which is what we expect to
  // occur. Ofcourse, after '~Child()' is invoked, the parent class's
  // destrcutor, '~Parent()' is also invoked - the compiler handles the
  // code to do that. Thus, both int pointed to by 'p_child' and the int
  // pointer to by 'p_parent' will be freed, in this order.
  delete p_obj1;

  // Now, lets allocate an object of class Child. Due to polymorphism rules,
  // we may store its pointer in a 'Parent*' pointer:
  Parent* p_obj2 = new Child;

  // Lets delete the object now. We free an object of class 'Parent', but
  // the destructor for this class is defined as virtual, so the run-time
  // environment identifies the run-time type of the object, which
  // is 'Child', and invokes the destructor of class 'Child'. Thus, both
  // destructor '~Child()' and '~Parent()' will be invoked, avoiding the
  // memory leak we saw previously.
  delete p_obj2;

As you can see, the matter of virtual destructors could be quite confusing, and yet, pertains a lot to memory leaks (and a host of other problems). The compiler would most likely not warn us of this kind of bug - we'll need to find it ourselves. Thus, it is good practice, to always define the destructor as virtual, and avoid this problem. This does slow down a little the invocation of the destructor, but this is relevant only in extremely time-critical applications.

`new` Vs. `new[]`

When we wish to allocate an array of objects, we need to use the new[] operator, rather then the new operator. The new[] operator allocates an array of elements of a given type, invoking the default constructor for each of them (or no construction, if the array of is native data types, such as int or float). Here are a few examples of using new[]:


// allocate an array of 5 integers, indexed 0-4.
// store its address in 'p_a'.
int* p_a = new int[5];

// if we want to initialize the array's data, we could do something like:
for (int i=0; i < 5; i++) {
  p_a[i] = 0;
}

// lets create an array of objects of class A. the default constructor,
// A::A(), will be invoked for each object in this array.
A a_vector = new A[10];

`delete` Vs. `delete[]`

When we want to delete the contents of an array allocated with new[], we need to use the delete[] operator. Here is how its done with the previously allocated arrays:


// delete the array of integers.
delete[] p_a;

// delete the array of objects. this will first invoke the destructor
// for each element in the array, and only then de-allocate the memory
// occupied by the entire array.
delete[] a_vector;

Note that the compiler (or runtime library) cannot know by itself whether to use the delete or the delete[] operator - we need to tell it that. This is true, because an 'int*' could either point to a single integer, allocated using 'new int', or an array of integers, allocated using 'new int[7]' (for instance).

Imagine that we have used delete instead of delete[]:


// allocate an array of integers.
int* p_i = new int[5];

// delete the array. Oops - we used delete instead of delete[].
// with a proper memory manager, the underlying memory manager's
// data structures will know the size of the allocated array,
// and thus free all the array, not just a single int.
// however, there is no guarantee that this will work with
// a different memory manager.
delete p_i;

// now lets allocate an array of objects:
A* a_vector = new A[10];

// lets delete the array. Oops again - we used delete instead of delete[].
// this time, however, the problem is more apparent - only the destructor
// for the first element in the array is invoked. the destructors for
// the rest of the objects in the array are not invoked. If those objects
// contain pointers to memory they need to de-allocate - this memory
// will now leak and be lost.
delete a_vector;

A similar problem (but of a different nature) could occur if we allocate an object using new, and later de-allocate it using delete[]. This time, however, instead of a memory leak, we will get memory corruption - since the runtime library might try to invoke a destructor for an entire array - while we only allocated a single element. This won't necessarily happen in practice, because the memory manager knows that the block of memory to be deleted has a size suiteable for a single object, and thus will destruct only a single object. However, counting on that is poor practice.

Inter-mixing C++ And C Memory Managers

The C++ language is actually an extention of the C language. Programs written in C++ are often linked with libraries written in C. This implies that the implementations of memory managers for C and C++ must be able to work together. On various systems, this is done by making new and delete actually use malloc() and free() for memory management - when we invoke new, it uses malloc() to allocate the memory, and then invokes the constructor for the object allocated. Because of this feature, accidental intermixing of new with free(), or of malloc() with delete might go un-noticed - especially when performed with native data types.

However, different implementations of the C++ memory manager might not work the same way. Thus, allocating memory with one memory manager, and freeing it with another, could cause memory corruption.

One may simply decide to only use the C++ memory manager when writing C++ programs. However, this solution is not useful, if using C libraries, that were compiled with a C compiler, and use the C memory manager. For example, if we use the standard C library's strdup() function to copy a C language string (a char pointer, actually), we need to be careful to free it with free(), not with delete.

The Odds Of References

A reference (in C++) is a variable that contains the address of another variable, but is not actually being treated like a pointer. A reference may be set to point to some variable only when it is defined (either as a local variable, or as a parameter of a function) - and from then on, accessing the reference will actually access the referenced variable. For example:


// lets define an int pointer, and initialize it.
int* num = new int(5);

// lets define a reference to the previous int pointer.
int& num_ref = *num;

// lets change the contents of the number that num_ref references.
num_ref = 7;

// the following command will now print '7'.
cout << *num << endl;

If a reference behaves like a pointer, except for the syntax, why is it used? In most situations, a reference is used to make the syntax simpler. When we have an object, Instead of accessing its memebrs of methods with '->', we access them with '.'. Another reason for using references, is that they can't become 'invalid' like pointers - once a reference references a given object, it cannot be made to reference another object, or to reference NULL.

But, is this realy the case? lets look at the following piece of code:


// this function accepts a reference to an integer, and modifies its value.
void f(int& num)
{
  num = 3;
}

// lets have a number, and pass it to the above function:
int num1 = 1;
// the function modifies the contents of 'num1' to '3',
// because it is passed by reference, not by value.
f(num1);

// lets have another number, this time a pointer.
int* p_num2;
// oops - we accidentally set this pointer to point to NULL.
p_num2 = NULL;
// lets modify the number p_num2 points to.
// oops....
f(*p_num2);

Once the last line of code in the above example is executed, the function contains a parameter that is a reference to a NULL pointer. The code inside the function can NOT check that it received a proper value - since a reference can NOT be tested as an address. Only the caller could perform this validity test. This kind of coding defeats safe programming practices (those of testing pointers for NULL values before trying to dereference them).

One may wonder why the program only crashed inside the function, not on the call to the function, where '*p_num2' clearly accesses a NULL pointer. The reason for this is that just typing '*p_num2' does not realy dereference the pointer - it does not access the contents of what p_num2 points to. Looking at the assembly code of this reveals the fact that this line merely loads the address (NULL) stored inside the variable pointer to by this pointer, into a register, to pass it to the function. The only code that actually accesses this variable is found inside the function f(). In a real program, the assignment of NULL into a pointer that is later passed as a reference won't occur just before the function call (obviously), but rather much earlier, in a different part of the code.

The moral here is that one should be careful to use references only where they clearly cannot contain a NULL value at runtime. If they might, use a pointer instead, and check for its validity.

Tools For C/C++ Memory Problems Resolving

As we saw, there are many ways in which usage of pointers, as well as dynamically allocated memory, could go wrong. These problems could be largely divided into 2 categories - memory leaks and memory smearing. Since these problems are so common, and sometimes quite hard to track, various tools were built to automate detection of such problems. We will mention a few of them here - you should feel encouraged to try some of them out - they are an invalauble asset for any C or C++ programmer.

Free Libraries For Locating Memory Leaks

Many free memory-leak locating libraries. They all work by replacing the malloc functions family (malloc, free, calloc and friends) with functions that perform the allocation while keeping some book accounting. In order to detect memory leaks, you need to store a list of all memory chunks that were allocated, then delete from this list any memory chunk that is being freed, and when the program exits, look at all blocks that are still found on the list - they weren't freed, and thus represent memory leaks. Ofcourse, you need to do more than this - such as store the stack trace during each memory chunk allocation, so the programmers will be able to find where in the code they allocated the memory.

Note that not all leaks are 'important' to deal with. For instance, if you allocate a chunk of memory during program startup, and allocate it only once, then freeing it isn't necessary to avoid leaks. However, if you do that, your malloc tracking log will be cluttered with messages about such '1-time leaks', and you'll find it hard to identify the real leaks. Also, if you ever change your 'main' into a function that gets invoked several times, (e.g. when turning a program into a library, for re-use), that '1-time leak' will turn into a real memory leak.

'dmalloc' is an example for a library that identifies memory leaks. It keeps a full log of memory allocations, and prints information out when the program exits. If your program never exits (e.g. a daemon) - then modify it to run a given ammount of iterations and then exit, just for memory leak detection. 'dmalloc' may be found at http://dmalloc.com.

Free Libraries For Identifying Memory Corrupting Code

Memory corrupting code comes in various forms. Reading the contents of uninitialized memory; Writing past the end of an allocated memory chunk, or a little before this chunk; Writing into a freed memory chunk (or reading from a freed memory chunk) which might have been re-allocated since. All sorts of odd behaviour could happen from these types of bugs - you can't even expect to know how such bugs will look like. But you can look for them using various libraries.

Techniques for identifying such problems would include putting markings before and after allocated memory chunks, and then verifying that they were no modified by the code. Or initializing all memory chunks with a given pattern of data, and then making sure that when they are first read from, their contents is not equal to this pattern - if it is, it means they were not yet initialized. Sometimes, using read-only or no-read allowed attributes for memory pages (using operating-system specific interfaces) can help identifying invalid accesses.

Electric fence is a library that uses the read-only page markings in this manner. It replaces malloc/free, and allocates each chunk of memory right after a read-only memory page. Any attempt to write into this page (which means, writing a little before an allocated memory chunk) will cause a SIGSEGV - and the programer will be able to identify this bug in this manner. The library uses a few other techniques, similar to that. The library may be downloaded from ftp://ftp.perens.com/pub/ElectricFence/.

A similar library is Gnu checker. Find it at http://www.gnu.org/software/checker/checker.html.

"Must-Have" Free Tools For Many Memory Problems Types - Linux/x86

One application that deals with most types of memory management problems (memory leaks, memory smearing, etc), is called Valgrind. This application, which is free software (GPL), was developed to track down a lot of such problems, without requiring any change to be made to the tested application. What it does is replace not only the memory management functions (malloc(), free() and friends), but also re-tune all of the program's source code, and place checks for memory access correctness before and after each access to a memory area. This allows it to notice memory access bugs as soon as they occur, and in real-time.

Valgrind is currently available only for the Linux operating system, and for the x86 processors family. However, nothing in its design should prevent it from being ported, in the future, to other operating systems, and other types of CPUs.

Finally, using valgrind is relatively easy - you install it, and then run your application (assuming it is dynamically linked - which is the case for most applications) using valgrind, giving valgrind a few flags to tell it what kind of errors to look for, and how to tune its work. Valgrind will slow down your application (sometimes even considerably), but for debugging, it will work much faster then trying to find the bugs on your own.

For more information about valgrind, please visit valgrind's home page.

"Must-Have" Commercial Tools For Many Memory Problems Types

Because memory error bugs can be so hard to find, and since the existing free tools have limitations (or did not exist in the past), various companies have developed commercial tools that check for memory leaks, memory smearing, reading un-initialized memory, and so on. These tools tend to handle a much broader types of bugs than the free tools, and have some GUI to activate them. They are also more advanced regarding checking of multi-threaded and multi-process applications.

One of these tools is called 'purify', currently sold by rational software. This tool can identify all sorts of memory problems, and there is no need to re-start the application after each problem is encountered - problems are reported as they are occuring, and unless there's a NULL pointer dereference, the program keeps running. Thus, in a single run, one may identify a large set of problems. It costs few thousands dollars for a 'floating' license, but its worth every dime. For more info look at http://www.rational.com/products/pqc/index.jsp.

Another product similar to purify is called insure++, from parasoft. It has features similar to what purify provides, thought it is more cumbersome to operate. Its one greatest advantage is that it has a Linux port (rational software holds back doing such a port of purify for quite a while). For more info about insure++ look at http://www.parasoft.com/products/insure/.

[LUPG Home] [Tutorials] [Related Material] [Essays] [Project Ideas] [Send Comments]

This document is copyright (c) 2001-2002 by guy keren.

The material in this document is provided AS IS, without any expressed or implied warranty, or claim of fitness for a particular purpose. Neither the author nor any contributers shell be liable for any damages incured directly or indirectly by using the material contained in this document.

permission to copy this document (electronically or on paper, for personal or organization internal use) or publish it on-line is hereby granted, provided that the document is copied as-is, this copyright notice is preserved, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.

Permission to make translations of this document is also granted, under these terms - assuming the translation preserves the meaning of the text, the copyright notice is preserved as-is, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.

For any questions about the document and its license, please contact the author.

Unix And C/C++ Runtime Memory Management For Programmers

Why Learn About Memory Management?

Unix Memory Management

Machine Architectures - Memory, Cache, Registers

Virtual Memory

Memory Protection

Run-time Management Of Virtual Memory

SIGSEGV This! SIGBUS That!

Load On Demand

Read-Only Memory

Shared Memory

Memory Management For Related Processes

Memory Managers

Memory Alignment

A Memory Manager's Algorithm

Linked List Of Memory Chunks

Allocating A Chunk

Freeing A Chunk

Algorithm's Efficiency

More Info On Memory Management Algorithms

Interaction With Unix's Memory Management

Specialized Memory Managers

Memory Manager's Interaction With Program's Code

C Runtime Memory Management

Program Memory Segments

The Stack And Local Variables

The Essence Of Pointers

Pointers And Assembly Language

Heap Management With malloc() and free()

Usage Of malloc()

Usage Of free()

Freeing The Same Chunk Twice

C++ Runtime Memory Management

Heap Management with new and delete

Usage Of new

Usage Of delete

new Vs. new[]

delete Vs. delete[]

Inter-mixing C++ And C Memory Managers

The Odds Of References

Tools For C/C++ Memory Problems Resolving

Free Libraries For Locating Memory Leaks

Free Libraries For Identifying Memory Corrupting Code

"Must-Have" Free Tools For Many Memory Problems Types - Linux/x86

"Must-Have" Commercial Tools For Many Memory Problems Types

Heap Management With `malloc()` and `free()`

Usage Of `malloc()`

Usage Of `free()`

Heap Management with `new` and `delete`

Usage Of `new`

Usage Of `delete`

`new` Vs. `new[]`

`delete` Vs. `delete[]`