Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Cooper K.Engineering a compiler

.pdf
Скачиваний:
53
Добавлен:
23.08.2013
Размер:
2.31 Mб
Скачать

6.8. SUMMARY AND PERSPECTIVE

161

table information by storing a pointer to the appropriate table into the ir at the start of each new level.

6.7.4Symbol Table Contents

So far, the discussion has focused on the organization and use of the symbol table, largely ignoring the details of what information should be recorded in the symbol table. The symbol table will include an entry for each declared variable and each procedure. The parser will create these entries. As translation proceeds, the compiler may need to create additional variables to hold values not named explicitly by the source code. For example, converting x − 2 × y into iloc creates a temporary name for the value 2 × y, and, perhaps, another for x − 2 × y. Often, the compiler will synthesize a name, such as t00017, for each temporary value that it creates. If the compiler names these values and creates symbol table records for them, the rest of the compiler can treat them in the same way that it handles programmer-declared names. This avoids special-case treatment for compiler-generated variables and simplifies the implementation. Other items that may end up in the symbol table, or in a specialized auxiliary table, include literal constants, literal strings, and source code labels.

For each entry in the symbol table, the compiler will keep some set of information that may include: its textual name, its source-language data type, its dimensions (if any), the name and level of the procedure that contains its declaration, its storage class (global, static, local, etc.), and its o set in storage from the start of its storage class. For global variables, call-by-reference parameters, and names referenced through a pointer, the table may contain information about possible aliases. For aggregates, such as structures in c or records in Pascal, the table should contain an index into a table of structure information. For procedures and functions, the table should contain information about the number and type of arguments that it expects.

6.8Summary and Perspective

The choice of intermediate representations has a major impact on the design, implementation, speed, and e ectiveness of a compiler. None of the intermediate forms described in this chapter is, definitively, the right answer for all compilers. The designer must consider the overall goals of the compiler project when selecting an intermediate form, designing its implementation, and adding auxiliary data structures such as symbol and label tables.

Contemporary compiler systems use all manner of intermediate representations, ranging from parse trees and abstract syntax trees (often used in source- to-source systems) through lower-than-machine level linear codes (used, for example, in the Gnu compiler systems). Many compilers use multiple irs— building a second or third ir to perform a particular analysis or transformation, then modifying the original, and definitive, ir to reflect the result.

162

CHAPTER 6. INTERMEDIATE REPRESENTATIONS

Questions

1.In general, the compiler cannot pay attention to issues that are not represented in the ir form of the code being compiled. For example, performing register allocation on one-address code is an oxymoron.

For each of the following representations, consider what aspects of program behavior and meaning are explicit and what aspects are implicit.

(a)abstract syntax tree

(b)static single assignment form

(c)one-address code

(d)two-address code

(e)three-address code

Show how the expression x - 2 × y might be translated into each form.

Show how the code fragment

if (c[i] = 0)

then a[i] b[i] ÷ c[i]; else a[i] b[i];

might be represented in an abstract syntax tree and in a control-flow graph. Discuss the advantages of each representation. For what applications would one representation be preferable to the other?

2.Some part of the compiler must be responsible for entering each identifier into the symbol table. Should it be the scanner or the parser? Each has an opportunity to do so. Is there an interaction between this issue, declare- before-use rules, and disambiguation of subscripts from function calls in a language with the Fortran 77 ambiguity?

3.The compiler must store information in the ir version of the program that allows it to get back to the symbol table entry for each name. Among the options open to the compiler writer are pointers to the original character strings and subscripts into the symbol table. Of course, the clever implementor may discover other options.

What are the advantages and disadvantages of each of these representations for a name? How would you represent the name?

Symbol Tables: You are writing a compiler for your favorite lexicallyscoped language.

Consider the following example program:

6.8. SUMMARY AND PERSPECTIVE

163

procedure main

 

integer a, b, c;

 

procedure f1(w,x);

 

integer a,x,y;

 

call f2(w,x);

 

end;

 

procedure f2(y,z)

 

integer a,y,z;

 

procedure f3(m,n)

 

here

integer b, m, n;

c = a * b * m * n;

 

end;

call f3(c,z); end;

. . .

call f1(a,b); end;

(a)Draw the symbol table and its contents at the point labelled here.

(b)What actions are required for symbol table management when the parser enters a new procedure and when it exits a procedure?

Chapter Notes

The literature on intermediate representations and experience with them is sparse. This is somewhat surprising because of the major impact that decisions about irs have on the structure and behavior of a compiler. The classic forms are described in textbooks dating back to the early 1970s. Newer forms like ssa are described in the literature as tools for analysis and transformation of programs.

In practice, the design and implementation of an ir has an inordinately large impact on the eventual characteristics of the completed compiler. Large, balky irs seem to shape systems in their own image. For example, the large asts used in early 1980s programming environments like Rn limited the size of programs that could be analyzed. The rtl form used in lcc is rather lowlevel in its abstraction. Accordingly, the compiler does a fine job of managing details like those needed for code generation, but has few, if any, transformations that require source-like knowledge, such as loop blocking to improve memory hierarchy behavior.

164

CHAPTER 6. INTERMEDIATE REPRESENTATIONS

Chapter 7

The Procedure Abstraction

7.1Introduction

In an Algol-like language, the procedure is a programming construct that creates a clean, controlled, protected execution environment. Each procedure has its own private, named storage. Statements executed inside the procedure can access these private, or local, variables. Procedures execute when they are invoked, or called, by another procedure. The called procedure can return a value to its caller, in which case the procedure is termed a function. This interface between procedures lets programmers develop and test parts of a program in isolation; the separation between procedures provides some insulation against problems in other procedures.

Procedures are the base unit of work for most compilers. Few systems require that the entire program be presented for compilation at one time. Instead, the compiler can process arbitrary collections of procedures. This feature, known as separate compilation, makes it feasible to to construct and maintain large programs. Imagine maintaining a one million line program without separate compilation. Any change to the source code would require a complete recompilation; the programmer would need to wait while one million lines of code compiled before testing a single line change. To make matters worse, all million lines would need to be consistent; this would make it di cult for multiple programmers to work simultaneously on di erent parts of the code.

The procedure provides three critical abstractions that allow programmers to construct non-trivial programs.

Control abstraction A procedure provides the programmer with a simple control abstraction; a standard mechanism exists for invoking a procedure and mapping it arguments, or parameters, into the called procedure’s name space. A standard return mechanism allows the procedure to return control to the procedure that invoked it, continuing the execution of this “calling” procedure from the point immediately after the call. This standardization lets the compiler perform separate compilation.

165

166

CHAPTER 7. THE PROCEDURE ABSTRACTION

Digression: A word about time

This chapter deals with both compile-time and run-time mechanisms. The distinction between events that occur at compile time and those that occur at run time can be confusing. All run-time actions are scripted at compile time; the compiler must understand the sequence of actions that will happen at run time to generate the instructions that cause the actions to occur. To gain that understanding, the compiler performs analysis at compile time and builds moderately complex compile-time structures that model the run-time behavior of the program. (See, for example, the discussion of lexically-scoped symbol tables in Section 6.7.3.) The compiler determines, at compile time, much of the storage layout that the program will use at run time; it then generates the code necessary to create that layout, to maintain it during execution, and to access variables and procedures in memory.

Name space Each procedure creates a new, protected name space; the programmer can declare new variables (and labels) without concern for conflicting declarations in other procedures. Inside the procedure, parameters can be referenced by their local names, rather than their external names. This lets the programmer write code that can be invoked in many di erent contexts.

External interface Procedures define the critical interfaces between the different parts of large software systems. The rules on name scoping, addressability, and orderly preservation of the run-time environment create a context in which the programmer can safely invoke code written by other individuals. This allows the use of libraries for graphical user interfaces, for scientific computation, and for access to system services.1 In fact, the operating system uses the same interface to invoke an application program; it simply generates a call to some designated entry point, like main.

The procedure is, in many ways, the fundamental programming abstraction that underlies Algol-like languages. It is an elaborate facade created collaboratively by the compiler, the operating system software, and the underlying hardware. Procedures create named variables; the hardware understands a linear array of memory locations named with integer addresses. Procedures establish rules for visibility of names and addressability; the hardware typically provides several variants of a load and a store operation. Procedures let us decompose large software systems into components; these must be knit together into a complete program before the hardware can execute it, since the hardware simply advances its program counter through some sequence of individual instructions.

A large part of the compiler’s task is putting in place the various pieces of the procedure abstraction. The compiler must dictate the layout of memory

1One of the original motivations for procedures was debugging. The user needed a known, correct mechanism to dump the contents of registers and memory after a program terminated abnormally. Keeping a dump routine in memory avoided the need to enter it through the console when it was needed.

7.1. INTRODUCTION

program Main(input, output); var x,y,z: integer; procedure Fee;

var x: integer; begin { Fee }

x := 1;

y := x * 2 + 1 end;

procedure Fie; var y: real; procedure Foe;

var z: real; procedure Fum;

var y: real; begin { Fum }

x := 1.25 * z; Fee;

writeln(’x = ’,x) end;

begin { Foe } z := 1; Fee;

Fum end;

begin { Fie } Foe;

writeln(’x = ’,x) end;

begin { Main } x := 0; Fie

end.

167

Call Tree

Main

@

@R

Fien

?

Foen

, ?

, Fumn

,, ?

Feen Feen

Execution History

1.Main calls Fie

2.Fie calls Foe

3.Foe calls Fee

4.Fee returns to Foe

5.Foe calls Fum

6.Fum calls Fee

7.Fee returns to Fum

8.Fum returns to Foe

9.Foe returns to Fie

10.Fie returns to Main

Figure 7.1: Non-recursive Pascal program

and encode that layout into the generated program. Since it may compile the di erent components of the program at di erent times, without knowing of their relationship to one another, this memory layout and all the conventions that it induces must be standardized and uniformly applied. The compiler must also understand the various interfaces provided by the operating system, to handle input and output, to manage memory, and to communicate with other processes.

This chapter focuses on the procedure as an abstraction and the mechanisms that the compiler uses to establish its control abstraction, its name space, and its interface to the outside world.

168

CHAPTER 7. THE PROCEDURE ABSTRACTION

7.2Control Abstraction

The procedure is, fundamentally, an abstraction that governs the transfer of control and the naming of data. This section explores the control aspects of procedure’s behavior. The next section ties this behavior into the naming disciplines imposed in procedural languages.

In Algol-like languages, procedures have a simple and clear call/return discipline. On exit from a procedure, control returns to the point in the calling procedure that follows its invocation. If a procedure invokes other procedures, they return control in the same way. Figure 7.1 shows a Pascal program with several nested procedures. The call tree and execution history to its right summarize what happens when it executes. Fee is called twice: the first time from Foe and the second time from Fum. Each of these calls creates an instance, or an invocation, of Fee. By the time that Fum is called, the first instance of Fee is no longer active. It has returned control to Foe. Control cannot return to that instance of Fee; when Fum calls Fee, it creates a new instance of Fee.

The call tree makes these relationships explicit. It includes a distinct node for each invocation of a procedure. As the execution history shows, the only procedure invoked multiple times in the example is Fee. Accordingly, Fee has two distinct nodes in the call tree.

When the program executes the assignment x := 1; in the first invocation of Fee, the active procedures are Fee, Foe, Fie, and Main. These all lie on the path from the first instance of Fee to the program’s entry in Main. Similarly, when it executes the second invocation of Fee, the active procedures are Fee, Fum, Foe, Fie, and Main. Again, they all lie on the path from the current procedure to Main.

The call and return mechanism used in Pascal ensures that all the currently active procedures lie along a single path through the call graph. Any procedure not on that path is uninteresting, in the sense that control cannot return to it. When it implements the call and return mechanism, the compiler must arrange to preserve enough information to allow the calls and returns to operate correctly. Thus, when Foe calls Fum, the calling mechanism must preserve the information needed to allow the return of control to Foe. (Foe may diverge, or not return, due to a run-time error, an infinite loop, or a call to another procedure that does not return.)

This simple call and return behavior can be modelled with a stack. As α calls β, it pushes the address for a return onto the stack. When β wants to return, it pops the address o the stack and branches to that address. If all procedures have followed the discipline, popping a return address o the stack exposes the next appropriate return address.

This mechanism is su cient for our example, which lacks recursion. It works equally well for recursion. In a recursive program, the implementation must preserve a cyclic path through the call graph. The path must, however, have finite length—otherwise, the recursion never terminates. Stacking the return addresses has the e ect of unrolling the path. A second call to procedure Fum would store a second return address in the location at the top of the stack—in

7.2. CONTROL ABSTRACTION

main() {

printf("Fib(5) is %d.", fibonacci(5));

}

int fibonacci( ord ) int ord;

{

int one, two; if (ord < 1)

{

puts("Invalid input."); return ERROR VALUE;

}

else if (ord == 1) return 0;

else

return fib(ord,&one,&two);

}

169

int fib(ord, f0, f1) int ord, *f0, *f1;

{

int result, a, b; if (ord == 2)

{/* base case */ *f0 = 0;

*f1 = 1;

result = 1;

}

else

{/* recurse */

(void) fib(ord-1,&a,&b);

result = a + b; *f0 = b;

*f1 = result;

}

return result;

}

Figure 7.2: Recursion Example

e ect, creating a distinct space to represent the second invocation of Fum. The same constraint applies to recursive and non-recursive calls: the stack needs enough space to represent the execution path.

To see this more clearly, consider the c program shown in Figure 7.2. It computes the fifth Fibonacci number using the classic recursive algorithm. When it executes, the routine fibonacci invokes fib, and fib invokes itself, recursively. This creates a series of calls:

Procedure

Calls

main

fibonacci(5)

fibonacci

fib(5,*,*)

fib

fib(4,*,*)

fib

fib(3,*,*)

fib

fib(2,*,*)

 

 

Here, the asterisk (*) indicates an uninitialized return parameter.

This series of calls has pushed five entries onto the control stack. The top three entries contain the address immediately after the call in fib. The next entry contains the address immediately after the call in fibonacci. The fourth entry contains the address immediately after the call to fibonacci in main.

After the final recursive call, denoted fib(2,*,*) above, fib executes the base case and the recursion unwinds. This produces a series of return actions:

170

 

 

CHAPTER 7.

THE PROCEDURE ABSTRACTION

 

Call

 

Returns to

 

 

The result(s)

 

 

 

 

fib(2,*,*)

 

fib(3,*,*)

 

 

1

(*one = 0; *two = 1;)

 

fib(3,*,*)

 

fib(4,*,*)

 

 

1

(*one = 1; *two = 1;)

 

fib(4,*,*)

 

fib(5,*,*)

 

 

2

(*one = 1; *two = 2;)

 

fib(5,*,*)

 

fibonacci(5)

 

 

3

(*one = 2; *two = 3;)

 

fibonacci(5)

 

main

 

 

3

 

 

 

 

 

 

 

 

 

 

 

The control stack correctly tracks these return addresses. This mechanism is su cient for Pascal-style call and return. In fact, some computers have hardwired this stack discipline into their call and return instructions.

More complex control flow Some programming languages allow a procedure to return a procedure and its run-time context. When the returned object is invoked, the procedure executes in the run-time context from which it was returned. A simple stack is inadequate to implement this control abstraction. Instead, the control information must be saved in some more general structure, such as a linked list, where traversing the structure does not imply deallocation. (See the discussion of heap allocation for activation records in the next section.)

7.3Name Spaces

Most procedural languages provide the programmer with control over which procedures can read and write individual variables. A program will contain multiple name spaces; the rules that determine which statements can legally access each name space are called scoping rules.

7.3.1Scoping Rules

Specific programming languages di er in the set of name spaces that they allow the programmer to create. Figure 7.3 summarizes the name scoping rules of several languages. Fortran, the oldest of these languages, creates two name spaces: a global space that contains the names of procedures and common blocks, and a separate name space inside each procedure. Names declared inside a procedure’s local name space supersede global names for references within the procedure. Within a name space, di erent attributes can apply. For example, a local variable can be mentioned in a save statement. This has the e ect of making the local variable a static variable—its value is preserved across calls to the procedure.

The programming language c has more complex scoping rules. It creates a global name space that holds all procedure names, as well as the names of global variables. It introduces a separate name space for all of the procedures in a single file (or compilation unit). Names in the file-level scope are declared with the attribute static; they are visible to any procedure in the file. The filelevel scope holds both procedures and variables. Each procedure creates its own name space for variables and parameters. Inside a procedure, the programmer can create additional name spaces by opening a block (with { and }). A block can declare its own local names; it can also contain other blocks.

Соседние файлы в предмете Электротехника