Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Cooper K.Engineering a compiler

.pdf
Скачиваний:
50
Добавлен:
23.08.2013
Размер:
2.31 Mб
Скачать

1.3. HIGH-LEVEL VIEW OF TRANSLATION

9

somehow this is more intuitive to most assembly programmers. This particular illusion—that the computer understands alphabetic names rather than binary numbers—is easily maintained by a lookup-table in a symbolic assembler.

The example expression showcases one particular abstraction that the compiler maintains, symbolic names. The example refers to values with the names w, x, y, and z. These names are not just values; a given name can take on multiple values as the program executes. For example, w is used on both the right-hand side and the left-hand side of the assignment. Clearly, w has one value before

execution of the expression and another afterwards (unless x × y × z 1 ). = 2

Thus, w refers to whatever value is stored in some named location, rather than a specific value, such as 15.

The memories of modern computers are organized by numerical addresses, not textual names. Within the address space of an executing program, these addresses correspond uniquely to storage locations. In a source-level program, however, the programmer may create many distinct variables that use the same name. For example, many programs define the variables i, j, and k in several di erent procedures; they are common names for loop index variables. The compiler has responsibility for mapping each use of the name j to the appropriate instance of j and, from there, into the storage location set aside for that instance of j. Computers do not have this kind of name space for storage locations; it is an abstraction created by the language designer and maintained by the compiler-generated code and its run-time environment.

To translate w w × 2 × x × y × z, the compiler must assign some storage location to each name. (We will assume, for the moment, that the constant two needs no memory location since it is a small integer and can probably be obtained using a load immediate instruction.) This might be done in memory, as in

0 w x y z

or, the compiler might elect to keep the named variables in machine registers with a series of assignments:

r1 w; r2 x; r3 y; and r4 z;

The compiler must choose, based on knowledge of the surrounding context, a location for each named value. Keeping w in a register will likely lead to faster execution; if some other statement assigns w’s address to a pointer, the compiler would need to assign w to an actual storage location.

Names are just one abstraction maintained by the compiler. To handle a complete programming language, the compiler must create and support a variety of abstractions, Procedures, parameters, names, lexical scopes, and controlflow operations are all abstractions designed into the source language that the compiler creates and maintains on the target machine (with some help from the other system software). Part of this task involves the compiler emitting the appropriate instructions at compile time; the remainder involves interactions between that compiled code and the run-time environment that supports it.

10

CHAPTER 1. AN OVERVIEW OF COMPILATION

loadAI

rarp , @w

rw

// load ’w’

loadI

2

r2

// constant 2 into r2

loadAI

rarp , load ’x’

 

 

loadAI

rarp , load ’y’

 

 

loadAI

rarp , load ’z’

rw

// rw w×2

mult

rw , r2

mult

rw , rx

rw

// rw (w×2) × x

mult

rw , ry

rw

// rw (w×2×x) × y

mult

rw , rz

rw

// rw (w×2×x×y) × z

storeAI

rw

rarp, 0

// write rw back to ’w’

Figure 1.1: Example in iloc

Thus, designing and implementing a compiler involves not only translation from some source language to a target language, but also the construction of a set of mechanisms that will create and maintain the necessary abstractions at run-time. These mechanisms must deal with the layout and allocation of memory, with the orderly transfer of control between procedures, with the transmission of values and the mapping of name spaces at procedure borders, and with interfaces to the world outside the compiler’s control, including input and output devices, the operating system, and other running programs.

Chapter 7 explores the abstractions that the compiler must maintain to bridge the gap between the programming model embodied in the source language and the facilities provided by the operating system and the actual hardware. It describes algorithms and techniques that the compilers use to implement the various fictions contained in the language definitions. It explores some of the issues that arise on the boundary between the compiler’s realm and that of the operating system.

1.3.3Creating the Output Program

So far, all of the issues that we have addressed also arise in interpreters. The di erence between a compiler and an interpreter is that the compiler emits executable code as its output, while the interpreter produces the result of executing that code. During code generation, the compiler traverses the internal data structures that represent the code and it emits equivalent code for the target machine. It must select instructions to implement each operation that appears in the code being compiled, decide when and where to move values between registers and memory, and choose an execution order for the instructions that both preserves meaning and avoids unnecessary hardware stalls or interlocks. (In contrast, an interpreter would traverse the internal data structures and simulate the execution of the code.)

Instruction Selection As part of code generation, the compiler must select a sequence of machine instructions to implement the operations expressed in the code being compiled. The compiler might choose the instructions shown in

1.3. HIGH-LEVEL VIEW OF TRANSLATION

11

Digression: About Iloc

Throughout the book, low-level examples are written in an notation that we call iloc—an acronym that William LeFebvre derived from “intermediate language for an optimizing compiler.” Over the years, this notation has undergone many changes. The version used in this book is described in detail in Appendix A.

Think of iloc as the assembly language for a simple risc machine. It has a standard complement of operations Most operations take arguments that are registers. The memory operations, loads and stores, transfer values between memory and the registers. To simplify the exposition in the text, most examples assume that all data is integer data.

Each operation has a set of operands and a target. The operation is written in five parts: an operation name, a list of operands, a separator, a list of targets, and an optional comment. Thus, to add registers 1 and 2, leaving the result in register 3, the programmer would write

add r1,r2 r3 // example instruction

The separator, , precedes the target list. It is a visual reminder that information flows from left to right. In particular, it disambiguates cases like load and store, where a person reading the assembly-level text can easily confuse operands and targets.

Figure 1.1 to implement

w w × 2 × x × y × z

on the iloc virtual machine. Here, we have assumed the memory layout shown earlier, where w appears at memory address zero.

This sequence is straight forward. It loads all of the relevant values into registers, performs the multiplications in order, and stores the result back to the memory location for w. Notice that the registers have unusual names, like rw to hold w and rarp to hold the address where the data storage for our named values begins. Even with this simple sequence, however, the compiler makes choices that a ect the performance of the resulting code. For example, if an immediate multiply is available, the instruction mult rw , r2 rw could be replaced with multI rw , 2 rw , eliminating the need for the instruction loadI 2 r2 and decreasing the number of registers needed. If multiplication is slower than addition, the instruction could be replaced with add rw , rw rw , avoiding the loadI and its use of r2 as well as replacing the mult with a faster add instruction.

Register Allocation In picking instructions, we ignored the fact that the target machine has a finite set of registers. Instead, we assumed that “enough” registers existed. In practice, those registers may or may not be available; it depends on how the compiler has treated the surrounding context.

12

CHAPTER 1. AN OVERVIEW OF COMPILATION

In register allocation, the compiler decides which values should reside in the registers of the target machine, at each point in the code. It then modifies the code to reflect its decisions. If, for example, the compiler tried to minimize the number of registers used in evaluating our example expression, it might generate the following code

loadAI

rarp, @w

r1

// load ’w’

add

r1, r1

r1

// r1

w × 2

loadAI

rarp, @x

r2

// load ’x’

mult

r1, r2

r1

// r1

(w×2) × x

loadAI

rarp, @y

r2

// load ’y’

mult

r1, r2

r1

// r1

(w×2×x) × y

loadAI

rarp, @z

r2

// load ’z’

mult

r1, r2

r1

// r1

(w×2×x×y) × z

storeAI

r1

rarp, @w

// write rw back to ’w’

This sequence uses two registers, plus rarp, instead of five.

Minimizing register use may be counter productive. If, for example, any of the named values, w, x, y, or z, are already in registers, the code should reference those registers directly. If all are in registers, the sequence could be implemented so that it required no additional registers. Alternatively, if some nearby expression also computed w × 2, it might be better to preserve that value in a register than to recompute it later. This would increase demand for registers, but eliminate a later instruction.

In general, the problem of allocating values to registers is np-complete. Thus, we should not expect the compiler to discover optimal solutions to the problem, unless we allow exponential time for some compilations. In practice, compilers use approximation techniques to discover good solutions to this problem; the solutions may not be optimal, but the approximation techniques ensure that some solution is found in a reasonable amount of time.

Instruction Scheduling In generating code for a target machine, the compiler should be aware of that machine’s specific performance constraints. For example, we mentioned that an addition might be faster than a multiplication; in general, the execution time of the di erent instructions can vary widely. Memory access instructions (loads and stores) can take many cycles, while some arithmetic instructions, particularly mult, take several. The impact of these longer latency instructions on the performance of compiled code is dramatic.

Assume, for the moment, that a load or store instruction requires three cycles, a mult requires two cycles, and all other instructions require one cycle. With these latencies, the code fragment that minimized register use does not look so attractive. The Start column shows the cycle in which the instruction begins execution and the End column shows the cycle in which it completes.

1.3. HIGH-LEVEL VIEW OF TRANSLATION

 

13

 

Start

End

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

3

loadAI

rarp, @w

r1

// load ’w’

4

4

add

r1, r1

r1

// r1

w × 2

5

7

loadAI

rarp, @x

r2

// load ’x’

8

9

mult

r1, r2

r1

// r1

(w×2) × x

10

12

loadAI

rarp, @y

r2

// load ’y’

13

14

mult

r1, r2

r1

// r1

(w×2×x) × y

15

17

loadAI

rarp, @z

r2

// load ’z’

18

19

mult

r1, r2

r1

// r1

(w×2×x×y) × z

20

22

storeAI r1

rarp, @w

// write rw back to ’w’

This nine instruction sequence takes twenty-two cycles to execute.

Many modern processors have the property that they can initiate new instructions while a long-latency instruction executes. As long as the results of a long-latency instruction are not referenced until the instruction completes, execution proceeds normally. If, however, some intervening instruction tries to read the result of the long-latency instruction prematurely, the processor “stalls”, or waits until the long-latency instruction completes. Registers are read in the cycle when the instruction starts and written when it ends.

In instruction scheduling, the compiler reorders the instructions in an attempt to minimize the number cycles wasted in stalls. Of course, the scheduler must ensure that the new sequence produces the same result as the original. In many cases, the scheduler can drastically improve on the performance of “naive” code. For our example, a good scheduler might produce

Start

End

 

 

 

 

 

 

1

3

loadAI

rarp , @w r1

// load ’w’

2

4

loadAI

rarp , @x r2

// load ’x’

3

5

loadAI

rarp , @y r3

// load ’y’

4

4

add

r1, r1

r1

// r1

w × 2

5

6

mult

r1

, r2

r1

// r1

(w×2) × x

6

8

loadAI

rarp , @z r2

// load ’z’

7

8

mult

r1

, r3

r1

// r1

(w×2×x) × y

9

10

mult

r1

, r2

r1

// r1

(w×2×x×y) × z

11

13

storeAI r1

 

rarp , @w

// write rw back to ’w’

This reduced the time required for the computation from twenty-two cycles to thirteen. It required one more register than the minimal number, but cut the execution time nearly in half. It starts an instruction in every cycle except eight and ten. This schedule is not unique; several equivalent schedules are possible, as are equal length schedules that use one more register.

Instruction scheduling is, like register allocation, a hard problem. In its general form, it is np-complete. Because variants of this problem arise in so many fields, it has received a great deal of attention in the literature.

Interactions Most of the truly hard problems that occur in compilation arise during code generation. To make matters more complex, these problems inter-

14

CHAPTER 1. AN OVERVIEW OF COMPILATION

Digression: Terminology

A careful reader will notice that we use the word “code” in many places where either “program” or “procedure” might naturally fit. This is a deliberate a ectation; compilers can be invoked to translate fragments of code that range from a single reference through an entire system of programs. Rather than specify some scope of compilation, we will continue to use the ambiguous term “code.”

act. For example, instruction scheduling moves load instructions away from the arithmetic operations that depend on them. This can increase the period over which the value is needed and, correspondingly, increase the number of registers needed during that period. Similarly, the assignment of particular values to specific registers can constrain instruction scheduling by creating a “false” dependence between two instructions. (The second instruction cannot be scheduled until the first completes, even though the values in the overlapping register are independent. Renaming the values can eliminate this false dependence, at the cost of using more registers.)

Chapters 8 through 11 describe the issues that arise in code generation and present a variety of techniques to address them. Chapter 8 creates a base of knowledge for the subsequent work. It discusses “code shape,” the set of choices that the compiler makes about how to implement a particular source language construct. Chapter 9 builds on this base by discussing algorithms for instruction selection—how to map a particular code shape into the target machine’s instruction set. Chapter 10 looks at the problem of deciding which values to keep in registers and explores algorithms that compilers use to make these decisions. Finally, because the order of execution can have a strong impact on the performance of compiled code, Chapter 11 delves into algorithms for scheduling instructions.

1.3.4Improving the Code

Often, a compiler can use contextual knowledge to improve the quality of code that it generates for a statement. If, as shown on the left side of Figure 1.2, the statement in our continuing example was embedded in a loop, the contextual information might let the compiler significantly improve the code. The compiler could recognize that the subexpression 2 × x × y is invariant in the loop – that is, its value does not change between iterations. Knowing this, the compiler could rewrite the code as shown on the right side of the figure. The transformed code performs many fewer operations in the loop body; if the loop executes more than once, this should produce faster code.

This process of analyzing code to discover facts from context and using that knowledge to improve the code is often called code optimization. Roughly speaking, optimization consists of two distinct activities: analyzing the code to understand its runtime behavior and transforming the code to capitalize on knowledge derived during analysis. These techniques play a critical role in the

1.4. COMPILER STRUCTURE

15

x ← · · ·

x ← · · ·

y ← · · ·

y ← · · ·

w ← 1

ti ← 2 × x × y

 

w ← 1

for i = 1 to n

for i = 1 to n

read z

read z

w ← w × 2 × x × y × z

w ← w × z × ti

end

end

Surrounding context

Improved code

Figure 1.2: Context makes a di erence

performance of compiled code; the presence of a good optimizer can change the kind of code that the rest of the compiler should generate.

Analysis Compilers use several kinds of analysis to support transformations. Data-flow analysis involves reasoning, at compile-time, about the flow of values at runtime. Data-flow analyzers typically solve a system of simultaneous set equations that are derived from the structure of the code being translated. Dependence analysis uses number-theoretic tests to reason about the values that can be assumed by subscript expressions. It is used to disambiguate references to elements of arrays and indexed structures.

Transformation Many distinct transformations have been invented that try to improve the time or space requirements of executable code. Some of these, like discovering loop-invariant computations and moving them to less frequently executed locations, improve the running time of the program. Others make the code itself more compact. Transformations vary in their e ect, the scope over which they operate, and the analysis required to support them. The literature on transformations is rich; the subject is large enough and deep enough to merit a completely separate book.

The final part of the book introduces the techniques that compilers use to analyze and improve programs. Chapter 13 describes some of the methods that compilers use to predict the runtime behavior of the code being translated. Chapter 14 then presents a sampling of the transformations that compilers apply to capitalize on knowledge derived from analysis.

1.4Compiler Structure

Understanding the issues involved in translation is di erent than knowing their solutions. The community has been building compilers since 1955; over those years, we have learned a number of lessons about how to structure a compiler. At the start of this chapter, the compiler was depicted as a single box that translated a source program into a target program. Reality is, of course, more complex than that simple pictogram.

16

CHAPTER 1. AN OVERVIEW OF COMPILATION

The discussion in Section 1.3 suggested a dichotomy between the task of understanding the input program and the task of mapping its functionality onto the target machine. Following this line of thought leads to a compiler that is decomposed into two major pieces, a front end and a back end.

source

front

ir

 

back

 

target

code

 

end

 

 

end

 

code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@@

 

 

,,

 

 

 

 

@R

,

 

 

 

errors

The decision to let the structure reflect the separate nature of the two tasks has several major implications for compiler design.

First, the compiler must have some structure that encodes its knowledge of the code being compiled; this intermediate representation (ir) or intermediate language becomes the definitive representation of the code for the back end. Now, the task of the front end is to ensure that the source program is well formed and to map that code into the ir, and the task of the back end is to map the ir onto the target machine. Since the back end only processes ir created by the front end, it can assume that the ir contains no errors.

Second, the compiler now makes multiple passes over the code before committing itself to target code. This should lead to better code; the compiler can, in e ect, study the code in its first pass and record relevant details. Then, in the second pass, it can use these recorded facts to improve the quality of translation. (This idea is not new. The original Fortran compiler made several passes over the code [3]. In a classic 1961 paper, Floyd proposed that the compiler could generate better code for expressions if it made two passes over the code [31].) To achieve this, however, the knowledge derived in the first pass must be recorded in the ir, where the second pass can find it.

Finally, the two pass structure may simplify the process of retargeting the compiler. We can easily envision constructing multiple back ends for a single front end; doing so would produce compilers that accepted the same language but targeted di erent machines. This assumes that the same ir program is appropriate for both target machines; in practice, some machine-specific details usually find their way into the ir.

The introduction of an ir into the compiler makes possible further passes over the code. These can be implemented as transformers that take as input an ir program and produce an equivalent, but improved, ir program. (Notice that these transformers are, themselves, compilers according to our definition in Section 1.1.) These transformers are sometimes called optimizations; they can be grouped together to form an optimizer or a middle end. This produces a structure that looks like:

1.5. SUMMARY AND PERSPECTIVE

 

 

 

17

source

 

 

 

 

 

 

 

ir

 

 

 

front

ir

 

middle

 

back

 

machine

code

 

end

 

 

end

 

 

end

 

code

 

H

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

HH

 

 

 

 

 

 

 

 

 

 

 

HH

Hj

 

?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

errors

 

 

 

 

 

We will call this a three-pass compiler ; it is often called an optimizing compiler. Both are misnomers. Almost all compilers have more than three passes. Still, the conceptual division into front end, middle end, and back end is useful. These three parts of the compiler have quite di erent concerns. Similarly, the term “optimization” implies that the compiler discovers an optimal solution to some problem. Many of the problems in that arise in trying to improve compiled code are so complex that they cannot be solved to optimality in a reasonable amount of time. Furthermore, the actual speed of the compiled code depends on interactions among all of the techniques applied in the optimizer and the back-end. Thus, when a single technique can be proved optimal, its interactions with other techniques can produce less than optimal results. As a result, a good optimizing compiler can improve the quality of the code, relative to an unoptimized version. It will often fail to produce optimal code.

The middle end can be a monolithic structure that applies one or more techniques to improve the code, or it can be structured as a series of individual passes that each read and write ir. The monolithic structure may be more e cient, in that it avoids lots of input and output activity. The multi-pass structure may lend itself to a less complex implementation and a simpler approach to debugging the compiler. The choice between these two approaches depends on the constraints under which the compiler is built and operates.

1.5Summary and Perspective

A compiler’s primary mission is to translate its input program into an equivalent output program. Many di erent programs qualify as compilers. Most of these can be viewed as either two pass or three pass compilers. They have a front end that deals with the syntax and meaning of the input language and a back end that deals with the constraints of the output language. In between, they may have a section that transforms the program in an attempt to “improve” it.

Di erent projects, of course, aim for di erent points in the compiler design space. A compiler that translates c code for embedded applications like automobiles, telephones, and navigation systems, might be concerned about the size of the compiled code, since the code will be burned into some form of read-only memory. On the other hand, a compiler that sits inside the user-interface of a network browser and translates compressed application code to drive the display might be designed to minimize the sum of compile time plus execution time.

18

CHAPTER 1. AN OVERVIEW OF COMPILATION

Questions

1.In designing a compiler, you will face many tradeo s. What are the five qualities that you, as a user, consider most important in a compiler that you purchased? Does that list change when you are the compiler writer? What does your list tell you about a compiler that you would implement?

2.Compilers are used in many di erent circumstances. What di erences might you expect in compilers designed for the following applications?

(a)a just-in-time compiler used to translate user interface code downloaded over a network

(b)a compiler that targets the embedded processor used in a cellular telephone

(c)a compiler used in the introductory programming course at a high school

(d)a compiler used to build wind-tunnel simulations that run on a massively parallel processors (where all the processors are identical)

(e)a compiler that targets numerically-intensive programs to a large network of diverse machines

Соседние файлы в предмете Электротехника