Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Jones N.D.Partial evaluation and automatic program generation.1999

.pdf
Скачиваний:
9
Добавлен:
23.08.2013
Размер:
1.75 Mб
Скачать

When can partial evaluation be of bene t? 291

Path nding in a graph

Suppose one is given a program p to compute F ind(G; A; B), where G is a graph and A; B are a source and target node. The result is to be some path from A to B if one exists, and an error report otherwise. The result of specializing p with respect to statically known G and B is a program good for nding paths from the various A to B. This could be useful, for example, for nding routes between various cities and one's home.

Naively specializing p would probably give a slow algorithm since for example Dijkstra's algorithm would trace all paths starting at dynamic A until static B was encountered. Alternatively, one can use the fact that A is of bounded static variation to get better results. The idea is to embed p in a program that calls the Find function only on fully static arguments:

function Paths-to(G, A, B) = let nodes = Node-list(G) in

forall A1 2 nodes do

if A = A1 then Find(G, A1, B) function Find(G, A, B) = ...

Note that nodes and so A1 are static. The result of specializing to G, A could thus be a program of form

function Paths-to-Copenhagen(A) =

if A = Hamburg

then [Hamburg, C1, ..., Copenhagen]

else

if A = London

then [London, D1, ..., Copenhagen]

else

...

 

else

if A = Paris

then [Paris, E1, ..., Copenhagen]

else NOPATH

in which all path traversal has been done at specialization time. In fact most partial evaluators, for example Similix, would share the list-building code, resulting in a specialized program with size proportional to the length of a shortest-path spanning tree beginning at A.

Sorting

Consider specializing a sorting algorithm with respect to the number of elements to be sorted. This can be pro table when sorting variable numbers of elements. One can use traditional methods, e.g. merge sort, until the number of elements to be sorted becomes less than, say, 100, at which point a specialized sorter is called. Figure 13.2 contains an example, run on Similix (syntax rewritten for clarity), resulting from specializing merge sort to n = 4.

The rst program version used a function merge to merge two lists. This had only dynamic arguments, so very little speedup resulted. To gain speedup, the lengths of the two lists were added as statically computable parameters, giving code like

292 Applications of Partial Evaluation

function merge-sort-4(A); A,B:array[0..3] of integer;

if A0 <= A1 then [B0:=A0; B1:=A1] else [B0:=A1; B1:=A0]; if A2 <= A3 then [B2:=A2; B3:=A3] else [B2:=A3; B3:=A2]; if B0 <= B2 then

[A0:=B0;

if B1<=B2 then [A1:=B1; A2:=B2; A3:=B3] else [A1:=B2; merge-4(B,A)]] else

[A0:=B2;

if B0<=B3 then [A1:=B0; merge-4(B,A)] else [A1:=B3; A2:=B0; A3:=B1]];

merge-sort-4:=A end

procedure merge-4(A,B);

if A1<=A3 then [B2:=A1; B3:=A3] else [B2:=A3; B3:=A1] end

Figure 13.2: Specialized merge sorting program.

procedure merge(A, Alength, B, Blength); merge :=

if Alength = 0 then B else if Blength = 0 then A else if first(A) < first(B) then

cons(first(A), merge(rest(A), Alength - 1, B, Blength) else cons(first(B), merge(A, Alength, rest(B), Blength - 1)

end

The length arguments of merge are static so all calls can be unfolded, resulting in essentially the specialized code seen in Figure 13.2.

The good news is that this program is between 3 and 4 times faster than the recursive version. The bad news is that specializing to successively larger values of n gives program size growing as O(n2), making the approach useless in practice.

What went wrong? In general the question of which of Alength or Blength is decreased depends on a dynamic test, so mix must account for all possible outcomes. Each length can range from 0 to n. There are O(n2) possible outcomes, so the specialized program will have size O(n2). (Its run time will still be of order n log(n) but with a smaller constant, so something has been gained.)

This problem is entirely due to non-obliviousness of the sorting algorithm. It leads directly to the question: does there exist a comparison-optimal weakly oblivious sorting algorithm?

Batcher's sorting algorithm [17] is both weakly oblivious and near optimal. It runs in time time O(n log2 n), and so yields specialized programs with size and

Exercises 293

speed O(n log2 n). Ajtai's sorting algorithm [5] is in principle still better, achieving the lower bound of n log(n). Unfortunately it is not usable in practice due to an enormous constant factor, yielding extremely large specialized sorters or sorting networks.

Interpreters

Interpreters are necessarily non-oblivious if the interpreted language contains tests, but we have seen that interpreters as a rule specialize quite well. This is at least partly because much experience has shown us how to write them so they give good results. Following are a few characteristics that seem important.

First, interpreters are usually written in a nearly compositional way, so the actions performed on a composite source language construction are a combination of the actions performed on its subconstructions. Compositionality is a key assumption for denotational semantics, where its main motivation is to make possible proofs based on structural induction over source language syntax.

From our viewpoint, compositionality implies that an interpreter manipulates only pieces of the original program. Since there is a xed number of these they can be used for function specialization.

In fact, compositionality may relaxed, as long as all static data is of bounded static variation, meaning that for any xed static interpreter program input, all variables classi ed as static can only take on nitely many values, thus guaranteeing termination. A typical example is the list of names appearing an environment binding source variables to values which for any given source program can grow, but not unboundedly in a language with static name binding (false for Lisp).

Interpreters written in other ways, for example ones which construct new bits of source code on the y, can be di cult or impossible to specialize with good speedup.

Second, well-written interpreters do not contain call duplication. An example problem concerns implementation of while E do Command. A poorly written interpreter might contain two calls to evaluate E or to perform Command, giving target code duplication (especially bad for nested while commands).

13.3Exercises

Exercise 13.1 Suggest three applications of partial evaluation to problems not discussed in this book.

2

Exercise 13.2 The following program p stores the smallest among Ai,. . . ,Aj in Ai, and the largest in Aj, assuming for simplicity that j , i + 1 is a power of 2. It uses 3n=2 , 2 comparisons, provably the least number of comparisons that is su cient to nd both minimum and maximum among n elements.

294 Applications of Partial Evaluation

procedure Minmax(i,j); if j - i = 1 then

[if A[i] > A[j] then

[tem := A[i]; A[i] := A[j]; A[j] := tem] ]

else

[i1 := (i+j-1)/2; j1 := (i+j+1)/2; Minmax(i,i1); Minmax(j1,j);

A[i] := Min(A[i],A[j1]); A[j] := Max(A[i1],A[j]); ]

1.Hand specialize p to i = 0; j = 7, unfolding all calls, to obtain program p07.

2.Compare the run time of p with that of p0n for n = 2m , 1, as a function of n, including a constant time c to perform one procedure call.

3.How large is program p0n for n = 2m , 1, as a function of n? Given your conclusion, under what circumstances would specialization of p be worth while?

4.Let pkij be p specialized to j , i + 1 2k as in Section 13.1.3. Note that for each k, program pkij has a xed size independent of i, j. Compare the run time for pk0n with those of p and pij. Does the speedup for xed k `propagate' to arrays of arbitrarily large size?

5.Does a similar speedup propagation occur when specializing a merge sort program to xed array size?

2

Exercise 13.3 The `table-directed input' of Section 13.1.4 can be implemented by at least three methods:

1.by a general interpreter, taking as parameters the table, its dimensions, and an array of action routine addresses;

2.by an interpreter tailored to a xed table with known dimensions and known action routines; or

3.by a `compiled' version of the table, realized by tests and goto's with inline code for the actions.

Compare the run time of these three approaches. Which method is used by scanner generators such as Yacc?

2

Exercise 13.4 Residual program size explosions as seen in Section 13.2.4 can make partial evaluation unpro table. Can the size explosion problem always be solved by choosing a more conservative binding-time analysis (i.e. one with fewer static variables)? Suggest a BTA tactic for avoiding such size explosions.

2

Part V

Advanced Topics

Chapter 14

Termination of Partial

Evaluation

Many partial evaluators have imperfect termination properties, the most serious being that they are not guaranteed to terminate on all static input. Partial evaluators do speculative evaluation on the basis of incomplete information, giving them a tendency to loop in nitely more often than a standard evaluator would. For example, a non-trivial partial evaluator reduces both branches of a conditional when they cannot resolve the guarding condition. Another way to put this is that partial evaluation is more eager than standard evaluation.

Non-termination is a most unfortunate behaviour from an automatic tool to improve programs. The problem is exacerbated if a compiler generated from the partial evaluator inherits its dubious termination properties. Such a compiler would be close to worthless: a non-expert user would be without a clue as how to revise the source program that made the compiler loop. An objection: some languages, e.g. PL/I, may have static semantics that open up for compile-time looping, but in this chapter our concern will be to ban non-termination of partial evaluation. In another setting, one could imagine that a cognizant user could be allowed to override the conservative assumptions of a specializer to obtain extra static computation.

After brie y describing termination strategies used in online partial evaluators, we analyse the problem of non-terminating partial evaluation in the o ine framework of Chapter 4. We then develop a binding-time analysis that solves termination problems.

14.1Termination of online partial evaluators

Online partial evaluators employ a number of techniques to ensure termination. Most consult some form of computational history, maintained during the specialization process, to make folding and unfolding decisions. When a call is encountered during specialization the decisions are: should this call be unfolded; and if not, how specialized a residual call should be generated?

297

298 Termination of Partial Evaluation

There are several well-known tradeo s. Unfolding too liberally can cause in nite specialization-time loops, even without generating any residual code. Generating residual calls that are too specialized (i.e. contain too much static data) can lead to an in nitely large residual program; while the other extreme of generating too general residual calls can lead to little or no speedup.

A variety of heuristics have been devised to steer online call unfolding, beginning with the very rst partial evaluation articles. In nite unfolding cannot occur without recursion; so specializers often compare the sizes and/or structures of arguments encountered in a function or procedure call with those of its predecessors, and use the outcome to decide whether to unfold and, if not, how much to generalize the call. A variety of strategies, some rather sophisticated, have been described [19,75,112,158,178,230,235,267,269,281].

14.2Termination of o ine partial evaluators

In o ine partial evaluation, essentially the same decisions have to be taken, with the same tradeo s. A di erence is that this is preprocessing work, done by the BTA (binding-time analysis). BTA computes a division for all program variables on the basis of a division of the input variables. At specialization time this classi cation of variables (and thereby computations) as static or dynamic is blindly obeyed | so all-important decisions of when to specialize are encapsulated within the computed division.

In the literature on termination of o ine partial evaluation, including this chapter, emphasis is on the distinction between increasing and decreasing static variables [117,130,246]. (An exception is Holst's poor man's generalization, which generalizes all variables which do not have control decisions depending on them. This does not guarantee termination, but the heuristic might have some practical merit.)

14.2.1Problem formulation

How much freedom is there in the choice of division? An indispensable requirement is that it must be congruent: any variable that depends on a dynamic variable must itself be classi ed as dynamic. (Without this, code generation is impossible.) Further, some congruent divisions are bad in that they lead to in nite residual programs, as seen in the example of Section 4.4.5.

The division may also be used to make annotations indicating when it is safe to compress transitions (unfold) without causing code duplication or computation duplication. Usefulness is also practically relevant: if a variable is dead, i.e. if no computation depends on it, then it should be classi ed as dynamic. This principle was found to be crucial for specialization of larger, imperative programs in Chapter 4.

Termination of o ine partial evaluators 299

Since ensuring nite specialization is by far the hardest problem, we shall concentrate exclusively on it, and ignore the other problems. Recall from Chapter 4 that the specialized program will be nite if and only if its set of specialized program points poly is nite. We thus ignore questions of code generation and transition compression. Consequently we have reduced the problem to the following:

Given a division of the program inputs, nd a division of all variables that

1.is congruent;

2.is nite, so for all input data, the set poly of reachable specialized program points will be nite; and in which

3.as many variables as possible are classi ed as `static'.

A congruent nite division is an achievable goal, since classifying all variables as dynamic is congruent and will indeed ensure termination (a trivial solution that yields no specialization at all.) The main problem is thus to classify `just' enough variables as dynamic to ensure congruence and niteness. Point 3 ensures performance of a maximal amount of computation by the specializer, thus increasing residual program e ciency and avoiding the trivial solution when possible.

14.2.2Problem analysis

Given a program p, a division, and static program input vs0, the set poly of all reachable specialized program points was de ned in Chapter 4 to be the smallest set such that

(pp0, vs0) is in poly, where pp0 is p's initial program point; and

if (pp, vs) 2 poly, then successors((pp, vs)) is a subset of poly

where successors((pp, vs)) = f(pp1; vs0); . . . ; (ppn; vs0)g is the set of static parts of program points reachable in computations beginning at (pp, vs) and continuing to the end of the basic block begun by pp. Clearly, poly is nite if and only if all static variables assume only nitely many di erent values.

Bounded static variation

Certain source program variables can only assume nitely many di erent values. One example is a static program input that is never changed. Another is a variable that can change during execution, but always assumes values that are substructures of a static input. The idea can be formalized as follows.

The binding-time analysis algorithm in Section 4.4.6 constructs a division div which is congruent but not always nite. Let us say that a variable xk is of bounded static variation if (1) it is classi ed as static by div; and (2) for any static program input vs0, the following set is nite:

300 Termination of Partial Evaluation

fvk j (pp; (v1 . . . vk . . .vn)) 2 polyg

Our goal is thus a program analysis to construct a better division by recognizing certain variables as of bounded static variation, classifying them as `static', and classifying all other variables as `dynamic'.

Example 14.1 Consider the following program and assume that x is known to be static. How should y be classi ed?

y := 0; loop: x := x-1;

y := y+2;

if x=6 0 goto loop else exit; exit: ...

Classifying y as static violates neither congruence nor niteness as the assignment y:=y+2 is only performed n times if n is the initial value of x. The value of y throughout the computation is thus bounded by 2n. Observe that though for any one value for x there is a bound for y, there exists no uniform bound for y.

Things look di erent if x is dynamic: we lose the bound on the number of iterations. Thus y is unbounded, even though the binding-time analysis in Section 4.4.6 would call it static. Hence to comply with the niteness criterion, y should be clas-

si ed as dynamic.

2

Finite downwards closure

How can we choose the division to make poly nite? A program analysis to recognize such properties as bounded static variation can be done using the fact that many value domains used in practice are are nitely downwards closed.

De nition 14.1 A set D with partial ordering < is nitely downwards closed i

8x 2 D : fy j y < xg is nite.

2

A trivial consequence is that there exists no in nite descending chain, that is, a sequence v1; v2; v3; . . . with vi 2 D and vi > vi+1 for i 1.

Examples

The set of natural numbers N with the usual ordering < is nitely downwards closed.

The set of integers Z with the usual ordering < is not nitely downwards

closed since 8x 2 Z : fy j y < xg is in nite.

De ning x < y to mean x is a substructure of y, the set of nite trees isnitely downwards closed. So are the sets of nite lists and S-expressions, together with other common nite algebraic structures.