Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Cooper K.Engineering a compiler

.pdf
Скачиваний:
53
Добавлен:
23.08.2013
Размер:
2.31 Mб
Скачать

6.5. MAPPING VALUES TO NAMES

151

Digression: The Hierarchy of Memory Operations in Iloc 9x

The iloc used in this book is abstracted from an ir named iloc 9x that was used in a research compiler project at Rice University. Iloc 9x includes a hierarchy of memory operations that the compiler uses to encode knowledge about values. These operations are:

Operation

Meaning

immediate

loads a known constant value into a

load

register

constant load

loads a value that does not change

 

during execution. The compiler does

 

not know the value, but can prove

 

that it is not defined by a program

 

operation.

scalar load

operates on a scalar value, not an ar-

& store

ray element, a structure element, or a

 

pointer-based value.

general load

operates on a value that may be an

& store

array element, a structure element,

 

or a pointer-based value. This is the

 

general-case operation.

By using this hierarchy, the front-end can encode knowledge about the target value directly into the iloc 9x code. As other passes discover additional information, they can rewrite operations to change a value from a general purpose load to a more restricted form. Thus, constant propagation might replace a general load or a scalar load with an immediate load. If an analysis of definitions and uses discovers that some location cannot be defined by any executable store operation, it can be rewritten as a constant load.

typically uses fewer registers than a modern processor provides. Thus, register allocation looks for memory-based values that it can hold in registers for longer periods of time. In this model, the allocator makes the code faster and smaller by removing some of the loads and stores.

Compilers for risc machines tend to use the register-to-register model for two di erent reasons. First, the register-to-register model more closely reflects the programming model of risc architecture. Risc machines rarely have a full complement of memory-to-memory operations; instead, they implicitly assume that values can be kept in registers. Second, the register-to-register model allows the compiler to encode subtle facts that it derives directly in the ir. The fact that a value is kept in a register means that the compiler, at some earlier point, had proof that keeping it in a register is safe.3

3If the compiler can prove that only one name provides access to a value, it can keep that value in a register. If multiple names might exist, it must behave conservatively and keep the

152

 

CHAPTER 6. INTERMEDIATE REPRESENTATIONS

Fortran

 

 

 

 

 

 

 

 

 

 

 

-

front

XX

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

code

 

end

A@ XXXXX

 

 

 

 

 

 

 

A@

 

 

 

zX

 

back

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

@

 

 

 

 

-

target 1

 

 

 

 

 

 

 

 

 

 

 

end

 

 

 

 

 

 

A

 

 

@

 

:

 

 

code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

 

,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scheme

-

front

 

 

 

@

 

,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

@

 

 

 

 

 

 

 

 

 

 

 

,

 

 

 

 

 

code

 

end

@ XAXX,X@X

 

 

 

 

 

 

 

 

@

 

A

 

@RXz

 

back

- target 2

 

 

 

 

 

 

 

 

 

 

 

@,

 

 

 

 

 

 

 

 

,@A

 

:

 

end

 

code

 

 

 

 

 

 

 

 

 

 

 

 

 

A

 

 

 

 

 

 

 

 

,

 

 

 

,

 

 

 

 

Java

-

front

 

 

@

A

 

 

 

 

,

 

 

,

 

 

 

 

 

 

 

X

X

 

@

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

,

 

 

 

 

 

code

 

end

 

 

 

 

,

A

@

 

 

 

 

 

 

 

XXXX

AX

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

,

 

@RXz

 

back

- target 3

 

 

 

 

,

 

A

 

 

 

 

 

 

 

 

AU

 

end

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

:

 

 

code

 

 

 

,

 

 

Smalltalk

-

front

,

 

 

 

 

 

 

 

 

code

 

end

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.7: Developing a universal intermediate form

6.6Universal Intermediate Forms

Because compilers are complex, programmers (and managers) regularly seek ways to decrease the cost of building them. A common, but profound, question arises in these discussions: can we design a “universal” intermediate form that will represent any programming language targeted for any machine? In one sense, the answer must be “yes”; we can clearly design an ir that is Turingequivalent. In a more practical sense, however, this approach has not, in general, worked out well.

Many compilers have evolved from a single front end to having front ends for two or three languages that use a common ir and a single back end. This requires, of course, that the ir and the back end are both versatile enough to handle all the features of the di erent source languages. As long as they are reasonably similar languages, such as Fortran and C, this approach can be made to work.

The other natural direction to explore is retargeting the back end to multiple machines. Again, for a reasonably small collection of machines, this has been shown to work in practice.

Both of these approaches have had limited success in both commercial and research compilers. Taken together, however, they can easily lead to a belief that we can produce n front ends, m back ends, and end up with n × m compilers. Figure 6.7 depicts the result of following this line of thought too far. Several

value in memory. For example, a local variable x can be kept in a register, unless the program takes its address (&x in c) or passes it as a call-by-reference parameter to another procedure.

6.7. SYMBOL TABLES

153

projects have tried to produce compilers for dissimilar languages and multiple machines, with the notion that a single ir can bind together all of the various components.

The more ambitious of these projects have foundered on the complexity of the ir. For this idea to succeed, the separation between front end, back end, and ir must be complete. The front end must encode all language-related knowledge and must encode no machine-specific knowledge. The back end must handle all machine-specific issues and have no knowledge about the source language. The ir must encode all the facts passed between front end and back end, and must represent all the features of all the languages in an appropriate way. In practice, machine-specific issues arise in front ends; language-specific issues find their way into back ends; and “universal” irs become too complex to manipulate and use.

Some systems have been built using this model; the successful ones seem to be characterized by

irs that are near the abstraction level of the target machine

target machines that are reasonably similar

languages that have a large core of common features

Under these conditions, the problems in the front end, back end, and ir remain manageable. Several commercial compiler systems fit this description; they compile languages such as Fortran, C, and C++ to a set of similar architectures.

6.7Symbol Tables

As part of translation, the compiler derives information about the various entities that the program manipulates. It must discover and store many distinct kinds of information. It will encounter a variety of names that must be recorded—names for variables, defined constants, procedures, functions, labels, structures, files, and computer-generated temporaries. For a given textual name, it might need a data type, the name and lexical level of its declaring procedure, its storage class, and a base address and o set in memory. If the object is an aggregate, the compiler needs to record the number of dimensions and the upper and lower bounds for each dimension. For records or structures, the compiler needs a list of the fields, along with the relevant information on each field. For functions and procedures, the compiler might need to know the number of parameters and their types, as well as any returned values; a more sophisticated translation might record information about the modification and use of parameters and externally visible variables.

Either the ir must store all this information, or the compiler must re-derive it on demand. For the sake of e ciency, most compilers record facts rather than recompute them. (The one common exception to this rule occurs when the ir is written to external storage. Such i/o activity is expensive relative to computation, and the compiler makes a complete pass over the ir when it reads the information. Thus, it can be cheaper to recompute information than to write it to external media and read it back.) These facts can be recorded

154

CHAPTER 6. INTERMEDIATE REPRESENTATIONS

directly in the ir. For example, a compiler that builds an ast might record information about variables as annotations (or attributes) on the nodes representing each variable’s declaration. The advantage of this approach is that it uses a single representation for the code being compiled. It provides a uniform access method and a single implementation. The disadvantage of this approach is that the single access method may be ine cient—navigating the ast to find the appropriate declaration has its own costs. To eliminate this ine ciency, the compiler can thread the ir so that each reference has a link back to its declaration. This adds space to the ir and overhead to the ir-builder. (The next step is to use a hash table to hold the declaration link for each variable during ir construction—in e ect, creating a symbol table.)

The alternative, as we saw in Chapter 4, is to create a central repository for these facts and to provide e cient access to it. This central repository, called a symbol table, becomes an integral part of the compiler’s ir. The symbol table localizes information derived from distant parts of the source code; it simplifies the design and implementation of any code that must refer to information derived earlier in compilation. It avoids the expense of searching the ir to find the portion that represents a variable’s declaration; using a symbol table often eliminates the need to represent the declarations directly in the ir. (An exception occurs in source-to-source translation. The compiler may build a symbol table for e ciency and preserve the declaration syntax in the ir so that it can produce an output program that closely resembles the input program.) It eliminates the overhead of making each reference contain a pointer back to the declaration. It replaces both of these with a computed mapping from the textual name back to the stored information. Thus, in some sense, the symbol table is simply an e ciency hack.

Throughout this text, we refer to “the symbol table.” In fact, the compiler may include several distinct, specialized symbol tables. These include variable tables, label tables, tables of constants, and reserved keyword tables. A careful implementation might use the same access methods for all these tables. (The compiler might also use a hash table as an e cient representation for some of the sparse graphs built in code generation and optimization.)

Symbol table implementation requires attention to detail. Because nearly every aspect of translation refers back to the symbol table, e ciency of access is critical. Because the compiler cannot predict, before translation, the number of names that it will encounter, expanding the symbol table must be both graceful and e cient. This section provides a high-level treatment of the issues that arise in designing a symbol table. It presents the compiler-specific aspects of symbol table design and use. For deeper implementation details and design alternatives, the reader is referred to Section B.4.

6.7.1Hash Tables

In implementing a symbol table, the compiler writer must choose a strategy for organizing and searching the table. Myriad schemes for organizing lookup tables exist; we will focus on tables indexed with a “hash function.” Hashing,

6.7. SYMBOL TABLES

 

155

 

0

 

 

 

 

 

 

 

 

 

 

1

b

 

 

12

 

 

 

a

 

 

3

 

 

 

 

 

 

 

 

h(d)

4

 

 

 

 

 

 

5

 

 

 

6

 

 

 

7

 

 

 

 

 

 

8

 

 

 

 

 

 

 

 

 

 

9

c

 

Figure 6.8: Hash-table implementation — the concept

as this technique is called, has an expected-case O(1) cost for both insertion and lookup. With careful engineering, the implementor can make the cost of expanding the table and of preserving it on external media quite reasonable. For the purposes of this chapter, we assume that the symbol table is organized as a simple hash table. Implementation techniques for hash tables have been widely studied and taught.

Hash tables are conceptually elegant. They use a hash function, h, to map names into small integers, and take the small integer as an index into the table. With a hashed symbol table, the compiler stores all the information that it derives about the name n in the table at h(n). Figure 6.8 shows a simple ten-slot hash table. It is a vector of records, each record holding the compilergenerated description of a single name. The names a, b, and c have already been inserted. The name d is being inserted, at h(d) = 2.

The primary reason for using hash tables is to provide a constant-time lookup, keyed by a textual name. To achieve this, h must be inexpensive to compute, and it must produce a unique small integer for each name. Given an appropriate function h, accessing the record for n requires computing h(n) and indexing into the table at h(n). If h maps two or more symbols to the same small integer, a “collision” occurs. (In Figure 6.8, this would occur if h(d) = 3.) The implementation must handle this situation gracefully, preserving both the information and the lookup time. In this section, we assume that h is a perfect hash function—that is, it never produces a collision. Furthermore, we assume that the compiler knows, in advance, how large to make the table. Section B.4 describes hash-table implementation in more detail, including hash functions, collision handling, and schemes for expanding a hash table.

6.7.2Building a Symbol Table

The symbol table defines two interface routines for the rest of the compiler.

LookUp(name) returns the record stored in the table at h(name) if one exists. Otherwise, it returns a value indicating that name was not found.

Insert(name,record) stores the information in record in the table at h(name). It may expand the table to accommodate the record for name.

156 CHAPTER 6. INTERMEDIATE REPRESENTATIONS

Digression: An Alternative to Hashing

Hashing is the most widely used method for organizing a compiler’s symbol table. Multiset discrimination is an interesting alternative that eliminates any possibility of worst-case behavior. The critical insight behind this technique is that the index can be constructed o -line in the scanner.

To use multiset discrimination for the symbol table, the compiler writer must take a di erent approach to scanning. Instead of processing the input incrementally, the compiler scans the entire program to find the complete set of identifiers. As it discovers each identifier, it creates a tuple name,position , where name is the text of the identifier and position is its ordinal position in the list of all tokens. It enters all the tuples into a large multiset.

The next step lexicographically sorts the multiset. In e ect, this creates a set of bags, one per identifier. Each bag holds the tuples for all of the occurrences of its identifier. Since each tuple refers back to a specific token, through its position value, the compiler can use the sorted multiset to rewrite the token stream. It makes a linear scan over the multiset, processing each bag in order. The compiler allocates a symbol table index for the entire bag, then rewrites the tokens to include that index. This augments the identifier tokens with their symbol table index. If the compiler needs a textual lookup function, the resulting table is ordered alphabetically for a binary search.

The price for using this technique is an extra pass over the token stream, along with the cost of the lexicographic sort. The advantages, from a complexity perspective, are that it avoids any possibility of hashing’s worst case behavior, and that it makes the initial size of the symbol table obvious, even before parsing. This same technique can be used to replace a hash table in almost any application where an o -line solution will work.

The compiler needs separate functions for LookUp and Insert. (The alternative would have LookUp insert the name when it fails to find it in the table.) This ensures, for example, that a LookUp of an undeclared variable will fail—a property useful for detecting a violation of the declare-before-use rule in syntax-directed translation schemes, or for supporting nested lexical scopes.

This simple interface fits directly into the ad hoc syntax-directed translation scheme for building a symbol table, sketched in Section 4.4.3. In processing declaration syntax, the compiler builds up a set of attributes for the variable. When the parser reduces by a production that has a specific variable name, it can enter the name and attributes into the symbol table using Insert. If a variable name can appear in only one declaration, the parser can call LookUp first to detect a repeated use of the name. When the parser encounters a variable name outside the declaration syntax, it uses LookUp to obtain the appropriate information from the symbol table. LookUp fails on any undeclared name. The compiler writer, of course, may need to add functions to initialize the table, to store it and retrieve it using external media, and to finalize it. For a language with a single name space, this interface su ces.

6.7. SYMBOL TABLES

157

6.7.3Handling Nested Lexical Scopes

Few, if any, programming languages provide a single name space. Typically, the programmer manages multiple names spaces. Often, some of these name spaces are nested inside one another. For example, a C programmer has four distinct kinds of name space.

1.A name can have global scope. Any global name is visible in any procedure where it is declared. All declarations of the same global name refer to a single instance of the variable in storage.

2.A name can have a file-wide scope. Such a name is declared using the static attribute outside of a procedure body. A static variable is visible to every procedure in the file containing the declaration. If the name is declared static in multiple files, those distinct declarations create distinct run-time instances.

3.A name can be declared locally within a procedure. The scope of the name is the procedure itself. It cannot be referenced by name outside the declaring procedure. (Of course, the declaring procedure can take its address and store it where other procedures can reference the address. This may produce wildly unpredictable results if the procedure has completed execution and freed its local storage.)

4.A name can be declared within a block, denoted by a pair of curly braces. While this feature is not often used by programmers, it is widely used by macros to declare a temporary location. A variable declared in this way is only visible inside the declaring block.

Each distinct name space is called a scope. Language definitions includes rules that govern the creation and accessibility of scopes. Many programming languages include some form of nested lexical scopes.

Figure 6.9 shows some fragments of c code that demonstrate its various scopes. The level zero scope contains names declared as global or file-wide static. Both example and x are global, while w is a static variable with file-wide scope. Procedure example creates its own local scope, at level one. The scope contains a and b, the procedure’s two formal parameters, and its local variable c. Inside example, curly braces create two distinct level two scopes, denoted as level 2a and level 2b. Level 2a declares two variables, b and z. This new incarnation of b overrides the formal parameter b declared in the level one scope by example. Any reference to b inside the block that created 2a names the local variable rather than the parameter at level one. Level 2b declares two variables, a and x. Each overrides a variable declared in an outer scope. Inside level 2b, another block creates level three and declares c and x.

All of this context goes into creating the name space in which the assignment statement executes. Inside level three, the following names are visible: a from 2b, b from one, c from three, example from zero, w from zero, and x from three. No incarnation of the name z is active, since 2a ends before three begins. Since

158

CHAPTER 6.

INTERMEDIATE REPRESENTATIONS

static int w;

/* level 0

*/

 

 

 

int x;

 

 

 

 

 

void example(a,b);

 

 

 

 

int a, b;

/* level 1

*/

 

 

 

{

 

 

 

 

 

int c;

 

 

 

 

 

{

/* level 2a

 

Level

Names

 

int b, z;

*/

 

 

 

0

w, x, example

 

...

 

 

 

 

 

1

a, b, c

 

}

 

 

 

 

 

 

 

2a

b, z

 

{

 

 

 

 

 

/* level 2b

 

2b

a, x

 

int a, x;

*/

 

 

 

3

c, x

 

...

 

 

 

 

 

 

 

 

{

 

 

 

 

 

int c, x; /* level 3

*/

 

 

 

b = a + b + c + x;

 

 

 

 

}

 

 

 

 

 

}

}

Figure 6.9: Lexical scoping example

example at level zero is visible inside level three, a recursive call on example can be made. Adding the declaration “int example” to level 2b or level three would hide the procedure’s name from level three and prevent such a call.

To compile a program that contains nested scopes, the compiler must map each variable reference back to a specific declaration. In the example, it must distinguish between the multiple definitions of a, b, c, and x to select the relevant declarations for the assignment statement. To accomplish this, it needs a symbol table structure that can resolve a reference to the lexically most recent definition. At compile-time, it must perform the analysis and emit the code necessary to ensure addressability for all variables referenced in the current procedure. At run-time, the compiled code needs a scheme to find the appropriate incarnation of each variable. The run-time techniques required to establish addressability for variables are described in Chapter 8.

The remainder of this section describes the extensions necessary to let the compiler convert a name like x to a static distance coordinate—a level,o set pair, where level is the lexical level at which x’s declaration appears and o - set is an integer address that uniquely identifies the storage set aside for x. These same techniques can also be useful in code optimization. For example, the dvnt algorithm for discovering and removing redundant computations relies on a scoped hash table to achieve e ciency on extended basic blocks (see Section 12.1).

6.7. SYMBOL TABLES

 

 

 

 

 

 

 

 

 

 

159

 

 

 

 

 

 

 

 

 

 

level 1

 

level 0

 

 

 

 

 

 

 

level 2

 

 

 

 

 

 

 

 

 

level 3

 

 

 

 

b,·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

x,·· ·

 

current

 

 

 

 

 

 

 

 

 

 

 

 

 

x,· · ·

 

 

 

level

x,

· · ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

c,·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

exa

·· ·

 

 

 

 

 

c,

· · ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

w,·· ·

 

 

 

 

 

 

 

 

 

 

 

a,

·· ·

 

 

 

 

 

 

 

a,

· · ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.10: Simple “sheaf-of-tables” implementation

The Concept To manage nested scopes, the parser must change, slightly, its approach to symbol table management. Each time the parser enters a new lexical scope, it can create a new symbol table for that scope. As it encounters declarations in the scope, it enters the information into the current table. Insert operates on the current symbol table. When it encounters a variable reference, LookUp must first search the table for the current scope. If the current table does not hold a declaration for the name, it checks the table for the surrounding scope. By working its way through the symbol tables for successively lower lexical levels, it will either find the most recent declaration for the name, or fail in the outermost scope—indicating that the variable has no declaration visible from the current scope.

Figure 6.10 shows the symbol table built in this fashion for our example program, at the point where the parser has reached the assignment statement. When the compiler invokes the modified LookUp function for the name b, it will fail in level three, fail in level two, and find the name in level one. This corresponds exactly to our understanding of the program—the most recent declaration for b is as a parameter to example, in level one. Since the first block at level two, block 2a, has already closed, its symbol table is not on the search chain. The level where the symbol is found, two in this case, forms the first part of the static distance coordinate for b. If the symbol table record includes a storage o set for each variable, then LookUp can easily return the static distance coordinate.

The Details To handle this scheme, two additional calls are required. The compiler needs a call that can initialize a new symbol table and one that can finalize the table for a level.

InitializeScope() increments the current level and creates a new symbol table for that level. It links the previous level’s symbol table to the new table, and updates the current level pointer used by both LookUp and

Insert.

FinalizeScope() changes the current level pointer so that it points at the

160

 

 

 

 

 

 

 

 

CHAPTER 6.

INTERMEDIATE REPRESENTATIONS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

level 1

 

level 0

 

 

 

 

 

 

 

 

 

 

 

level 2b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

level 3

 

 

 

 

 

 

 

b,·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

level 2a 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

x,· · ·

 

current

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

level

 

 

 

x,·· ·

 

 

b,·· ·

 

 

 

 

 

 

 

x,

· · ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

c,·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

exa

·· ·

 

 

 

 

 

 

 

 

 

c,

· · ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

w,· · ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a,

·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a,

·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

z,·· ·

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.11: Final table for the example

table for the scope surrounding the current level, and then decrements the current level. If compiler needs to preserve the level-by-level tables for later use, FinalizeLevel can either leave the table intact in memory, or it can write the table to external media and reclaim its space. (In a system with garbage collection, FinalizeLevel should add the finalized table to a list of such tables.)

To account for lexical scoping, the parser should call InitializeScope each time it enters a new lexical scope and FinalizeScope each time it exits a lexical scope.

With this interface, the program in Figure 6.9 would produce the following sequence of calls

1.

InitializeScope

9. InitializeScope

17.

LookUp(b)

2.

Insert(a)

10.

Insert(a)

18.

LookUp(c)

3.

Insert(b)

11.

Insert(x)

19.

LookUp(x)

4.

Insert(c)

12.

InitializeScope

20.

FinalizeScope

5.

InitializeScope

13.

Insert(c)

21.

FinalizeScope

6.

Insert(b)

14.

Insert(x)

22.

FinalizeScope

7.

Insert(z)

15.

LookUp(b)

 

 

8.

FinalizeScope

16.

LookUp(a)

 

 

As it enters each scope, the compiler calls InitializeScope(). It adds each variable to the table using Insert(). When it leaves a given scope, it calls FinalizeScope() to discard the declarations for that scope. For the assignment statement, it looks up each of the names, as encountered. (The order of the LookUp() calls will vary, depending on how the assignment statement is traversed.)

If FinalizeScope retains the symbol tables for finalized levels in memory, the net result of these calls will be the symbol table shown in Figure 6.11. The current level pointer is set to an invalid value. The tables for all levels are left in memory and linked together to reflect lexical nesting. The compiler can provide subsequent passes of the compiler with access to the relevant symbol

Соседние файлы в предмете Электротехника