Data-Structures-And-Algorithms-Alfred-V-Aho
.pdf
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/f10_25.gif
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/f10_25.gif [1.7.2001 19:31:29]
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/f10_26.gif
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/f10_26.gif [1.7.2001 19:31:35]
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/f10_28.gif
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/f10_28.gif [1.7.2001 19:31:45]
Data Structures and Algorithms: CHAPTER 12: Memory Management
Memory Management
This chapter discusses the basic strategies whereby memory space can be reused, or shared by different objects that grow and shrink in arbitrary ways. For example, we shall discuss methods that maintain linked lists of available space, and "garbage collection" techniques, where we figure out what is available only when it seems we have run out of available space.
12.1 Issues in Memory Management
There are numerous situations in computer system operation when a limited memory resource is managed, that is, shared among several "competitors." A programmer who does not engage in the implementation of systems programs (compilers, operating systems, and so on) may be unaware of such activities, since they frequently are carried out "behind the scenes." As a case in point, Pascal programmers are aware that the procedure new(p) will make pointer p point to a new object of the correct type. But where does space for that object come from? The procedure new has access to a large region of memory, called the "heap," that the program variables do not use. From that region, an unused block of consecutive bytes sufficient to hold an object of the type that p points to is selected, and p is made to hold the address of the first byte of that block. But how does the procedure new know which bytes of the memory are "unused'"? Section 12.4 suggests the answer.
Even more mysterious is what happens if the value of p is changed, either by an assignment or by another call to new(p). The block of memory p pointed to may now be inaccessible, in the sense that there is no way to get to it through the program's data structures, and we could reuse its space. On the other hand, before p was changed, the value of p may have been copied into some other variable. In that case, the memory block is still part of the program's data structures. How do we know whether a block in the memory region used by procedure new is no longer needed by the program?
Pascal's sort of memory management is only one of several different types. For example, in some situations, like Pascal, objects of different sizes share the same memory space. In others, all objects sharing the space are of the same size. This distinction regarding object sizes is one way we can classify the kinds of memory management problems we face. Some more examples follow.
1. In the programming language Lisp, memory space is divided into cells, which
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (1 of 33) [1.7.2001 19:32:19]
Data Structures and Algorithms: CHAPTER 12: Memory Management
are essentially records consisting of two fields; each field can hold either an atom (object of elementary type, e.g., an integer) or a pointer to a cell. Atoms and pointers are the same size, so all cells require the same number of bytes. All known data structures can be made out of these cells. For example, linked lists of atoms can use the first fields of cells to hold atoms and second fields to hold pointers to the next cells on the list. Binary trees can be represented by using the first field of each cell to point to the left child and the second field to point to the right child. As a Lisp program runs, the memory space used to hold a cell may find itself part of many different structures at different times, either because one cell is moved among structures, or because the cell becomes detached from all structures and its space is reused.
2.A file system generally divides secondary storage devices, like disks, into fixed length blocks. For example, UNIX typically uses blocks of 512 bytes. Files are stored in a sequence of (not necessarily consecutive) blocks. As files are created and destroyed, blocks of secondary storage are made available to be reused.
3.A typical multiprogramming operating system allows several programs to share main memory at one time. Each program has a required amount of memory, which is known to the operating system, that requirement being part of the request for service issued when it is desired to run the program. While in examples (1) and (2), the objects sharing memory (cells and blocks, respectively) were each of the same size, different programs require different amounts of memory. Thus, when a program using say 100K bytes terminates, it may be replaced by two programs using 50K each, or one using 20K and another 70K (with 10K left unused). Alternatively, the 100K bytes freed by the program terminating may be combined with an adjacent 50K that are unused and a program needing up to 150K may be run. Another possibility is that no new program can fit in the space made available, and that 100K bytes is left free temporarily.
4.There are a large number of programming languages, like Snobol, APL, or SETL, that allocate space to objects of arbitrary size. These objects, which are values assigned to variables, are allocated a block of space from a larger block of memory, which is often called the heap. When a variable changes value, the new value is allocated space in the heap, and a pointer for the variable is set to point to the new value. Possibly the old value of the variable is now unused, and its space can be reused. However, languages like Snobol or SETL implement assignments like A = B by making the pointer for A point to the same object that B's pointer points to. If either A or B is reassigned, the previous object is not freed, and its space cannot be reclaimed.
Example 12.1. In Fig. 12.1(a) we see the heap that might be used by a Snobol program with three variables, A, B, and C. The value of any variable in Snobol is a character string, and in this case, the value of both A and B is 'OH HAPPY DAY' and
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (2 of 33) [1.7.2001 19:32:19]
Data Structures and Algorithms: CHAPTER 12: Memory Management
the value of C is 'PASS THE SALT'.
We have chosen to represent character strings by pointers to blocks of memory in the heap. These blocks have their first 2 bytes (the number 2 is a typical value that could be changed) devoted to an integer giving the length of the string. For example, 'OH HAPPY DAY' has length 12, counting the blanks between words, so the value of A (and of B) occupies 14 bytes.
If the value of B were changed to 'OH HAPPY DAZE', we would find an empty block in the heap 15 bytes long to store the new value of B, including the 2 bytes for the length. The pointer for B is made to point to the new value, as shown in Fig. 12.1(b). The block holding integer 12 and 'OH HAPPY DAY' is still useful, since A points to it. If the value of A now changes, that block would become useless and could be reused. How one tells conveniently that there are no pointers to such a block is a major subject of this chapter.
Fig. 12.1. String variables in a heap.
In the four examples above, we can see differences along at least two orthogonal "dimensions." The first issue is whether objects sharing storage are or are not of equal length. In the first two examples, Lisp programs and file storage, the objects, which are Lisp cells in one case and blocks holding parts of files in the other, are of the same size. This fact allows certain simplifications of the memory management problem. For example, in Lisp implementation, a region of memory is divided into spaces, each of which can hold exactly one cell. The management problem is to find empty spaces to hold newly-created cells, and it is never necessary to store a cell in such a position that it overlaps two spaces. Similarly, in the second example, a disk is divided into equal sized blocks, and each block is assigned to hold part of one file; we never use a block to store parts of two or more files, even if a file ends in the middle of a block.
In contrast, the third and fourth examples, covering memory allocation by a multiprogramming system and heap management for those languages that deal with variables whose values are "big" objects, speak of allocating space in blocks of different sizes. This requirement presents certain problems that do not appear in the fixed-length case. For example, we fear fragmentation, a situation in which there is much unused space, but it is in such small pieces that space for one large object cannot be found. We shall say more about heap management in Sections 12.4 and
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (3 of 33) [1.7.2001 19:32:19]
Data Structures and Algorithms: CHAPTER 12: Memory Management
12.5.
The second major issue is whether garbage collection, a charming term for the recovery of unused space, is done explicitly or implicitly, that is, by program command or only in response to a request for space that cannot be satisfied otherwise. In the case of file management, when a file is deleted, the blocks used to hold it are known to the file system. For example, the file system could record the address of one or more "master blocks" for each file in existence; the master blocks list the addresses of all the blocks used for the file. Thus, when a file is deleted, the file system can explicitly make available for reuse all the blocks used for that file.
In contrast, Lisp cells, when they become detached from the data structures of the program, continue to occupy their memory space. Because of the possibility of multiple pointers to a cell, we cannot tell when a cell is completely detached; therefore we cannot explicitly collect cells as we do blocks of a deleted file. Eventually, all memory spaces will become allocated to useful or useless cells, and the next request for space for another cell implicitly will trigger a "garbage collection." At that time, the Lisp interpreter marks all the useful cells, by an algorithm such as the one we shall discuss in Section 12.3, and then links all the blocks holding useless cells into an available space list, so they can be reused.
Figure 12.2 illustrates the four kinds of memory management and gives an example of each. We have already discussed the fixed block size examples in Fig. 12.2. The management of main memory by a multiprogramming system is an example of explicit reclamation of variable length blocks. That is, when a program terminates, the operating system, knowing what area of memory was given to the program, and knowing no other program could be using that space, makes the space available immediately to another program.
The management of a heap in Snobol or many other languages is an example of variable length blocks with garbage collection. As for Lisp, a typical Snobol interpreter does not try to reclaim blocks of memory until it runs out of space. At that time the interpreter performs a garbage collection as the Lisp interpreter does, but with the additional possibility that strings will be moved around the heap to reduce fragmentation, and that adjacent free blocks will be combined to make larger blocks. Notice that the latter two steps are pointless in the Lisp environment.
Fig. 12.2. Examples of the four memory management strategies.
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (4 of 33) [1.7.2001 19:32:19]
Data Structures and Algorithms: CHAPTER 12: Memory Management
12.2 Managing Equal-Sized Blocks
Let us imagine we have a program that manipulates cells each consisting of a pair of fields; each field can be a pointer to a cell or can hold an "atom." The situation, of course, is like that of a program written in Lisp, but the program may be written in almost any programming language, even Pascal, if we define cells to be of a variant record type. Empty cells available for incorporation into a data structure are kept on an available space list, and each program variable is represented by a pointer to a cell. The cell pointed to may be one cell of a large data structure.
Example 12.2. In Fig. 12.3 we see a possible structure. A, B, and C are variables; lower case letters represent atoms. Notice some interesting phenomena. The cell holding atom a is pointed to by the variable A and by another cell. The cell holding atom c is pointed to by two different cells. The cells holding g and h are unusual in that although each points to the other, they cannot be reached from any of the variables A, B, or C, nor are they on the available space list.
Let us assume that as the program runs, new cells may on occasion be seized from the available space list. For example, we might wish to have the null pointer in the cell with atom c in Fig. 12.3 replaced by a pointer to a new cell that holds atom i and a null pointer. This cell will be removed from the top of the available space list. It is also possible that from time to time, pointers will change in such a way that cells become detached from the program variables, as the cells holding g and h in Fig. 12.3 have been. For example, the cell holding c may, at one time, have pointed to the cell holding g. As another example, the value of variable B may at some time change, which would, if nothing else has changed, detach the cell now pointed to by B in Fig. 12.3 and also detach the cell holding d and e (but not the cell holding c, since it would still be reached from A). We call cells not reachable from any variable and not on the available space list inaccessible.
Fig. 12.3. A network of cells.
When cells are detached, and therefore are no longer needed by the program, it would be nice if they found their way back onto the available space list, so they could be reused. If we don't reclaim such cells, we shall eventually reach the unacceptable situation where the program is not using all the cells, yet it wants a cell
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (5 of 33) [1.7.2001 19:32:19]
Data Structures and Algorithms: CHAPTER 12: Memory Management
from available space, and the available space list is empty. It is then that a timeconsuming garbage collection must be performed. This garbage collection step is "implicit," in the sense that it was not explicitly called for by the request for space.
Reference Counts
One seemingly attractive approach to detecting inaccessible cells is to include in each cell a reference count, that is, an integer-valued field whose value equals the number of pointers to the cell. It is easy to maintain reference counts. When making some pointer point to a cell, add one to the reference count for that cell. When a nonnull pointer is reassigned, first decrease by one the reference count for the cell pointed to. If a reference count reaches zero, the cell is inaccessible, and it can be returned to the available list.
Unfortunately, reference counts don't always work. The cells with g and h in Fig. 12.3 are inaccessible cells linked in a cycle. Their reference counts are each 1, so we would not return them to the available list. One can attempt to detect cycles of inaccessible cells in a variety of ways, but it is probably not worth doing so. Reference counts are useful for structures that do not have pointer cycles. One example of a structure with no possibility of cycles is a collection of variables pointing to blocks holding data, as in Fig. 12. I. There, we can do explicit garbage collection simply by collecting a block when its reference count reaches zero. However, when data structures allow pointer cycles, the reference count strategy is usually inferior, both in terms of the space needed in cells and the time taken dealing with the issue of inaccessible cells, to another approach which we shall discuss in the next section.
12.3 Garbage Collection Algorithms for Equal-Sized Blocks
Let us now give an algorithm for finding which of a collection of cells of the types suggested in Fig. 12.3 are accessible from the program variables. We shall define the setting for the problem precisely by defining a cell type in Pascal that is a variant record type; the four variants, which we call PP, PA, AP, and AA, are determined by which of the two data fields are pointers and which are atoms. For example, PA means the left field is a pointer and the right field an atom. An additional boolean field in cells, called mark, indicates whether the cell has been found accessible. That is, by setting mark to true when garbage collecting, we "mark" the cell, indicating it is accessible. The important type definitions are shown in Fig. 12.4.
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (6 of 33) [1.7.2001 19:32:19]
Data Structures and Algorithms: CHAPTER 12: Memory Management
type
atomtype = { some appropriate type; preferably of the same size as pointers }
patterns = (PP, PA, AP, AA); celltype = record
mark: boolean;
case pattern: patterns of PP: (left: − celltype; right:
− celltype);
PA: (left: − celltype; right: atomtype);
AP: (left: atomtype; right: − celltype); AA: (left: atomtype; right: atomtype);
end;
Fig. 12.4. Definition of the type for cells.
We assume there is an array of cells, taking up most of the memory, and some collection of variables, that are pointers to cells. For simplicity, we assume there is only one variable, called source, pointing to a cell, but the extension to many variables is not hard. † That is, we declare
var
source: − celltype;
memory: array [1..memorysize ] of celltype;
To mark the cells accessible from source, we first "unmark" all cells, accessible or not, by running down the array memory and setting the mark field to false. Then we perform a depth-first search of the graph emanating from source, marking all cells visited. The cells visited are exactly those that are accessible. We then traverse the array memory and add to the available space list all unmarked cells. Figure 12.5 shows a procedure dfs to perform the depth-first search; dfs is called by the procedure collect that unmarks all cells, and then marks accessible cells by calling dfs. We do not show the code linking the available space list because of the peculiarities of Pascal. For example, while we could link available cells using either all left or all right cells, since pointers and atoms are assumed the same size, we are not permitted to replace atoms by pointers in cells of variant type AA.
(1) procedure dfs ( currentcell: −
http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1212.htm (7 of 33) [1.7.2001 19:32:19]
