Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Jones N.D.Partial evaluation and automatic program generation.1999

.pdf
Скачиваний:
9
Добавлен:
23.08.2013
Размер:
1.75 Mб
Скачать

Types of problems susceptible to partial evaluation 281

Applications include real-world predictions (weather, pollution, etc.), and obtaining information to identify where to make further observations, so as to gain a more complete and faithful system description.

Such a computational model is often quite large, and is developed by several researchers and/or groups from di erent disciplines. Model building is a long lasting and usually continuous process. If the model computes results disagreeing with observed values, a natural rst step is to `tune' critical parameters such as exchange rates or conversion e ciencies, and then rerun the simulation in the hope of obtaining a better t. (This is defensible since critical parameters are often physically unobservable.)

If this fails, parts of the mathematical model may have to be changed, for instance by modifying the di erential equations to give a more sophisticated description of the phenomenon, or adding new equations to model processes not accounted for before. These are programmed, and the whole process is repeated.

This scenario o ers many opportunities to exploit partial evaluation. First, such a system must necessarily be programmed in a modular way to separate scienti c concerns and to allow di erent workers to concentrate on their specialities. As argued above, partial evaluation can gain e ciency.

Second, parts of the model and its parameters may change much less rapidly than others, so it may be worthwhile to specialize the model with respect to them. One example is the number and forms of the topographical cells used to model an ocean basin. This is naturally a user-de nable parameter, re nable when more precise simulations are needed; but not a parameter that is changed often. Thus specializing a program suite with respect to the number and dimensions of topographical cells could increase e ciency by unfolding loops, and precomputing values that do not change when doing repeated runs to tune boundary conditions or exchange rates.

13.1.3Improving recursive programs

The divide and conquer paradigm is used in constructing algorithms in a wide range of areas. A problem instance is classi ed as atomic or composite, and atomic problems are solved at once. A composite problem is decomposed into subproblems, each is solved separately, and their results are combined to yield the entire problem's solution.

The approach naturally leads to recursive algorithms. E ciency is often obtained by decomposing composite problems into subproblems of nearly equal size, so binary decompositions often lead to a near-balanced binary tree of subproblems.

Atomic instances are usually solved quite quickly, so the time spent at the lower tree levels in calling functions, transmitting parameters, and related stack manipulation can be large relative to the amount of computation actually done. Further, in a binary tree half the nodes are leaves, and 15=16 are within distance 3 of a leaf.

A consequence is that an optimizing computation at the bottom-most levels

282 Applications of Partial Evaluation

will, in some cases, speed up the entire computation by a signi cant factor. Just when this occurs will depend on the recurrence equations describing the algorithm's running time.

One strategy to exploit this phenomenon, assuming for simplicity that subproblems form a balanced binary tree, is to let the program maintain a counter measuring distance to the frontier. This code can be transformed by total call unfolding for su ciently small values of the counter by partial evaluation. For a simple example, begin by rewriting de nition

f(x) = if atomic?(x) then Base case code else g(f(part1(x)),f(part2(x)))

g(u,v) = ...

by adding a level counter, assumed 0 for atomic subproblems:

f(x,k) = if k=0 then Base case code else g(f(part1(x),k-1),f(part2(x),k-1))

g(u,v) = ...

Then add a new function f1, identical to f but called when k becomes 2 or less:

f(x,k)

= if

k=2

then f1(x,2) else

 

 

if

k=1

then f1(x,1) else

 

 

if k=0 then

Base case code

else

 

 

g(f(part1(x),k-1),f(part2(x),k-1))

g(u,v)

= ...

 

 

 

 

f1(x,k1) = if k=0 then

Base case code

else

g(f1(part1(x),k-1),f1(part2(x),k-1))

Argument k1 of f1 is constant and so static, giving the partial evaluator opportunity for complete unfolding and simpli cation and thus reducing call overhead.

Suppose that solving an atomic problem takes time a, that a composite problem of size n > 1 involves two subproblems of size n=2, and that combining two subproblem solutions takes constant2 time b. This leads to a recurrence equation whose solution is of the form (a + b) n + . . . with coe cient additively dependent on a. The strategy above in e ect reduces a and n since more subproblems are solved without recursive calls, and problems for small n are solved faster.

13.1.4Problems of an interpretive nature

It has become clear that partial evaluation is well suited to applications based on programming language interpreters. It is perhaps less clear that many problem solutions outside programming languages are also essentially interpretive in nature, and so susceptible to automatic optimization using our methods. A few examples

2This is a correct assumption for some problems but not, for example, for sorting.

Types of problems susceptible to partial evaluation 283

follow, with some overlap with Sections 13.1.1 and 13.1.2 since the concepts of modularity, varying rates of parameter variation, and interpretation are hard to separate and often appear in the same program.

Interpretation evolves naturally in the quest for generality and modi ability of large-scale programming problems. We outline a common scenario. Once several related problems in an applications area have been understood and solved individually, the next step is often to write a single general program able to solve any one of a family of related problems. This leads to a program with parameters, sometimes numerous, to specify problem instances.

Use of the program for new applications and by new user groups make it desirable to devise a user-oriented language, to specify such parameters in a way more related to the problems being solved than to the programming language or the algorithms used to solve them. The existing general program will thus be modi ed to accept problem descriptions that are more user-oriented. The result is a exible and problem-oriented tool which may, in comparison with the time spent on the underlying computational methods, spend relatively much of its time testing and/or computing on parameters, and deciphering commands in the user-oriented language. In other words it is an interpreter, and as such subject to optimization by our methods.

Circuit simulation

Circuit simulators take as input an electrical circuit description, construct di erential equations describing its behaviour, and solve these by numerical methods. This can be thought of as interpreting the circuit description. Berlin and Weise [21] cite large speedups resulting from specializing a general circuit simulator written in Scheme to a xed circuit.

Neural networks

Training a neural network typically uses much computer time. Partial evaluation has been applied to a simulator written in C for training neural networks by backpropagation [126]. The resulting generator transforms a given network into a faster simulator, specialized to the xed network topology. Observed speedups were from 25% to 50% | not dramatic but signi cant given the amount of computer time that neural net training takes.

Computing in networks

Consider a problem to be solved by a MIMD (multiple instruction, multiple data) network of processors connected together for communication along some topological con guration. This models some physical phenomena directly, e.g. solution of heat equations or uid dynamics, and is also a standard framework for general parallel processing not directed towards particular concrete problems.

What code is to be stored in each processor? A complicating factor is that not all processors appear in identical contexts, for example those modelling boundary

284 Applications of Partial Evaluation

situations may have fewer neighbours than those more centrally in the network3 . A simple approach is to write one piece of code which is given the processor's network location as a parameter, and which can compute as required for processors anywhere in the network. In our context it is natural use partial evaluation to specialize this code to its location parameter, obtaining as many programs as there are di ering network environments. Pingali and Rogers report signi cant e ciency

gains using exactly this technique [225,217].

Table-directed input

Given a source program character sequence, it is well known that token scanning (or lexical analysis) can be done using a state transition table that associates with each pair (state, input-character-class) a transition, which is another pair (action, next-state).

For example, a scanner might go into state Number when a sign or digit is seen and, as long as are read, remain in that state, performing actions to accumulate the number's value; with similar state and action transitions for other token types such as identi ers or strings. To illustrate, suppose numerical tokens have syntax:

hIntegeri

::=

hSpacei [ + j -] (hDigiti j hSpacei)

hDigiti

::=

0 j 1 j . . . j 9

Following is a state transition table for numbers.

State

hSpacei

+

-

hDigiti

End-of-line

 

 

 

 

 

 

0

no action

sign := 1;

sign := -1;

sign := 1;

 

 

 

sum := 0

sum := 0

sum := hDigiti

 

next:

0

1

1

2

0

 

 

 

 

 

 

1

no action

error

error

sum := hDigiti

result :=

 

 

 

 

 

sign*sum

next:

1

0

0

2

0

2

no action

error

error

sum := 10*sum

result :=

 

 

 

 

+ hDigiti

sign*sum

next:

2

0

0

2

0

 

 

 

 

 

 

In practice the transition table approach is useful as it ensures completeness, since all combinations of state and input character must be accounted for; and ease of modi cation, since many corrections involve only a single table entry or a single action routine.

This scheme is usually implemented by storing the transition table in memory as data, and writing a small interpreter to follow its directives (an instance by Oliver uses microprogramming [206]). Alternatively, one may `compile' the transition table directly into code, for example with one label for each state, and with a state transition being realized by a `goto'. The result is usually faster, but the program

3An exception is the hypercube, where every processor has an identical environment; but even here asymmetry problems arise if some processors develop faults.

When can partial evaluation be of bene t? 285

structure and logic may be more complex and so hard to modify.

Partial evaluation allows an automatic transformation of the rst data-directed program into the more e cient alternate form. Experience shows a substantial speedup (unless masked by input/outout operations).

A related example showing quite signi cant speedup is due to Penello. His `very fast LR parsing' method takes as starting point a parsing table as produced by the Yacc parser generator [216]. The method compiles the table into assembly code, yielding a specialized parser that runs 6 to 10 times faster than table interpretation.

Pattern matching

Section 12.2 showed that partial evaluation of a general regular expression matcher with respect to a xed regular expression R gave very e cient residual programs. All parts of R were `compiled' away and the residual program was essentially a deterministic nite automaton.

Logical meta-systems

As seen in Chapter 6, partial evaluation can be of considerable use when one uses a high-level metalanguage to describe other systems or languages.

There are good reasons to believe that similar bene ts will accrue from mechanical treatment of other high-level speci cation languages. For example, the Edinburgh/Carnegie-Mellon/Gothenburg Logical Framework activity involves a combined theorem prover and reduction engine which might be much improved in e ciency by specialization to particular theories.

13.2When can partial evaluation be of bene t?

We now take another tack, trying to explore reasons for success or failure of partial evaluation for automatic program improvement.

Suppose program p computes function f(s; d), where input s is static, i.e. known at specialization time, and d is dynamic. Termination of partial evaluation, and the size and e ciency of the specialized program ps, depends critically on the way p uses its static and dynamic inputs.

We now analyse these. A rst case is that p has no static inputs. Even in this case partial evaluation can be of bene t, as discussed in Sections 13.1.1 and 13.2.1. A second case generalizes an idea from complexity theory to see when partial evaluation can give predictably good results. An oblivious Turing machine is one whose read head motion depends only on the length of the machine's input tape, and is independent of its contents.

Oblivious algorithms. We call program p oblivious (with respect to its input division) if its control ow depends only on the values of static inputs, i.e. if it never

286 Applications of Partial Evaluation

tests dynamic data4. Such programs are common and are discussed further in Section 13.2.2.

The absence of dynamic tests implies that partial evaluation of an oblivious program exactly parallels normal execution; just one thread of code is to be accounted for. An important consequence is that termination depends on the values static data assume in a single computation and not on many possible combinations of static values, the choice among which will be determined by dynamic input.

The size and run time of ps are both proportional to the number of times code in p containing dynamic commands or expressions is encountered. Specialization time is proportional to the time to perform p's static computations plus the time to generate ps.

Henceforth we shall assume o ine partial evaluation, where every command or expression in p has been annotated as static or dynamic. Predicting time and size of ps appears to be harder when using online partial evaluation, since nonobliviousness manifests itself only during specialization and not before.

Non-oblivious algorithms. A non-oblivious program may follow many possible computation threads, depending on the values of dynamic inputs. A partial evaluator must account for all such possibilities, generalizing specialized code for each combination (concretely it must specialize code for both branches of all dynamic tests). This can result in large specialized programs ps, even though they are likely to be faster than p.

Interpreters are non-oblivious due to their need to implement tests in the program being interpreted. In later sections we shall discuss both non-oblivious programs as well as `weakly oblivious' ones.

13.2.1Partial evaluation without static data

Partial evaluation can be of use even when there is no static program input at all. One example: its utility for improving modularly written, parametrized highlevel programs was described at the beginning of this chapter. Another is that partial evaluation encompasses a number of traditional compiler optimizations, as explained by A.P. Ershov [79].

Constant propagation is a familiar optimization, and arises in practical situations beyond user control. It is needed to generate e cient code for array accesses, e.g. intermediate code for A[I,1] := B[2,3] + A[I,1] will have many operations involving only constants. Constant folding is clearly an instance of partial evaluation, as are several other low-level optimizations.

Partial evaluation also realizes `interprocedural optimizations', in some cases entirely eliminating procedures or functions. Finally, the technique of `procedure cloning' is clearly function specialization, and Appel's `re-opening closures' [61,13]

4Alternative terms are `data independent' or `static' [21,173].

When can partial evaluation be of bene t? 287

is another example of partial evaluation without static input data.

13.2.2Oblivious algorithms

The natural program to compute the matrix product prod(p; A; B) where A; B are p p matrices is oblivious in dimension p.

prod(p,A,B):

for i := 1 to p do for j := 1 to p do [ C[i,j] := 0;

for k := 1 to p do C[i,j] := C[i,j] + A[i,k] * B[k,j]; write C[i,j] ]

A su cient test for obliviousness. First do a binding-time analysis. Then p is oblivious if pann contains no tests on dynamic variables.

Consequences of obliviousness. Let ps be the result of specializing program p to static s. If p is oblivious then ps will contain no control transfers, since all tests are static and thus done at specialization time. In general pn has size and running time O(n3). For instance, p2 could be

prod_2(A,B):

write A[1,1] * B[1,1] + A[1,2] * B[2,1]; write A[1,1] * B[1,2] + A[1,2] * B[2,2]; write A[2,1] * B[1,1] + A[2,2] * B[2,1]; write A[2,1] * B[1,2] + A[2,2] * B[2,2]

Compiling. Partial evaluation of oblivious programs (or functions) gives long sequences of straight line code in an imperative language, or large expressions without conditionals in a functional language. This gives large `basic blocks', and for these there are well-developed compiling and optimization techniques [4].

In particular good code can be generated for pipelined architectures due to the absence of tests and jumps. Basic blocks can also be much more e ciently implemented on parallel architectures than code with loops. Both points are mentioned by Berlin and Lisper [21,173]. Further, exploitation of distributive laws can lead to very short parallel computing times, for example log(n) time algorithms for multiplying matrices of xed dimension n.

An example in scienti c computing. Oblivious programs are quite common, for example numerical algorithms are often oblivious in dimensional parameters, and otherwise contain large oblivious parts. This makes them very suitable for improvement by partial evaluation. For a concrete example, consider a general Runge-

288 Applications of Partial Evaluation

Kutta program for approximate integration of ordinary di erential equations5 of form

dyi(t) = f0

(t; y

1

; . . . ; yn):

i = 1; . . . ; n

dt

i

 

 

 

 

 

 

 

 

where fi0(t; y1; . . . ; yn) is the derivative of fi with respect to t. Functions yi(t) are often called state variables.

The goal is to tabulate the values of yi(t); yi(t + ); yi(t + 2 ); . . . for a series of t values and i = 1; . . . ; n, given initial values of the state variables and t. One step of the commonly used fourth-order Runge-Kutta method involves computing fi0(t; y1; . . . ; yn) for four di erent argument tuples, and for each i = 1; . . . ; n. The inputs to an integration program Int might thus be

1.Eqns, the system of equations to be solved;

2.Coeffs, numerical coe cients used in the equations;

3.Step, the step size to be used for integration (called above) and M, the

number of steps to be performed; and

4. Init, initial values for the state variables and t.

Among the inputs, Eqns varies least frequently, and will either be interpreted, or represented by calls to a user-de ned function, say Fprime(I,Ys), where Ys is the array of state variables and I indicates which function is to be called. If interpretation is used, as has been seen in, for example, circuit simulators, specialization with respect to Eqns and Coeffs will remove the often substantial overhead.

If a user-de ned Fprime(I,Ys) is used, two improvements can be realized automatically. The rst is splitting: specialization can automatically transform the `bundled' code for Fprime(I,Ys) into n separate function de nitions. Further, splitting array Ys into n separate variables can reduce computation time. The second improvement is that the code for fi0(t; y1; . . . ; yn) can be inserted inline in the integrator, avoiding function call, parameter passing, and return time.

It often happens that the same equations are to be integrated, but with di erent coe cients, e.g. for experimental modelling. The generating extension of Int with respect to Eqns yields a program that, when given Coeffs, will produce a specialized integrator and precompute what is possible using the known coe cient values. Here optimizations such as x 0 = 0 done at specialization time can give signi cant speedups.

5Runge-Kutta integration is also used as an example in [21].

When can partial evaluation be of bene t? 289

13.2.3Weakly oblivious programs

We call p weakly oblivious if changes in dynamic inputs cannot e ect changes in the sequences of values bound to static variables | a weaker condition than obliviousness since p is allowed to contain dynamic tests.

A `bubble sort' program Bsort is weakly oblivious in the length n of the list to be sorted since, even though dynamic comparison and exchange operations exist, they do not a ect the values assigned to any static variables. A specialized program Bsortn is a linear sequence of comparisons and conditional element swaps, with size and running time O(n2).

Partial evaluation of a weakly oblivious program p terminates on s if and only if p terminates on this s and any d, since dynamic tests do not a ect the value sequences assigned to static variables. As before the size of ps is proportional to the number of times code in p containing dynamic commands or expressions is encountered. Its run time may be larger, though, due to the presence of dynamic loops.

Weakly oblivious programs have much in common with oblivious ones. For example, although not yielding straightline code, ps still tends to have large basic blocks suitable for pipelined or parallel architectures; and its size is much more predictable than for non-oblivious programs.

A simple program that is not weakly oblivious is

double(x)

=

f(x,0)

 

f(x,y)

=

if x =

0 then y else f(x-1, y+2)

where x is dynamic. The values of variable y are initially zero and thereafter incremented by a constant, so a naive binding time analysis would classify y as static (though less naive analyses as in Chapter 14 would classify it as dynamic).

Even though y does not directly depend on x, the sequence of values it assumes is in fact determined by x. This is dynamic, so a partial evaluator will have to account for both possibilities of the test outcome, leading to specialization with in nitely many values of y.

For another example let program p perform binary search in table T0 ,. . . ,T2n,1, with initial call Find(T, 0, m, x) and m = 2n,1. The program is weakly oblivious if we assume delta is static and i is dynamic, since the comparison with x does not a ect the value assigned to delta.

Find(T, i, delta, x) = Loop: if delta = 0 then

if x = T[i] then return(i) else return(NOTFOUND); if x >= T[i+delta] then i := i + delta;

delta := delta/2; goto Loop]

Specializing with respect to static delta = 4 and dynamic i gives

290 Applications of Partial Evaluation

if x >= T[i+4] then i := i+4; if x >= T[i+2] then i := i+2; if x >= T[i+1] then i := i+1;

if x = T[i] then return(i) else return(NOTFOUND)

In general pn runs in time O(log(n)), and with a better constant coe cient than the general program. Moreover, it has size O(log(n)).

13.2.4Non-oblivious algorithms

Many programs are not oblivious in either sense, and this can lead to unpredictable results in partial evaluation. We have seen that ps can become enormous or in nite since all possible combinations of static variable values must be accounted for, even though few of these may occur in any one computation of [[p]] [s,d] for any one value of d.

To illustrate the problems that can occor, reconsider the binary search program above with n static. One may certainly classify i as static since it ranges over 0; 1; . . . ; n , 1. The resulting program is, however, not oblivious since the test on x a ects the value of static i.

Specialization with respect to static delta = 4 and i = 0 now gives

if x >= T[4] then if x >= T[6] then if x >= T[7] then

[if x = T[7] then return(7) else return(NOTFOUND)] else [if x = T[6] then return(6) else return(NOTFOUND)] else if x >= T[5] then

[if x = T[5] then return(5) else return(NOTFOUND)]

[if x = T[4] then return(4) else return(NOTFOUND)] else if x >= T[2] then

if x >= T[3] then

[if x = T[3] then return(3) else return(NOTFOUND)] else [if x = T[2] then return(2) else return(NOTFOUND)] else if x >= T[1] then

[if x = T[1] then return(1) else return(NOTFOUND)] [if x = T[0] then return(0) else return(NOTFOUND)]

The specialized program again runs in time O(log(n)), and with a yet better constant coe cient than above. On the other hand it has size O(n) | exponentially larger than the weakly oblivious version!

However, the consequences are not always negative. Following are two case studies illustrating some problems and ways to overcome them.