Jones N.D.Partial evaluation and automatic program generation.1999
.pdfWhen can partial evaluation be of bene t? 291
Path nding in a graph
Suppose one is given a program p to compute F ind(G; A; B), where G is a graph and A; B are a source and target node. The result is to be some path from A to B if one exists, and an error report otherwise. The result of specializing p with respect to statically known G and B is a program good for nding paths from the various A to B. This could be useful, for example, for nding routes between various cities and one's home.
Naively specializing p would probably give a slow algorithm since for example Dijkstra's algorithm would trace all paths starting at dynamic A until static B was encountered. Alternatively, one can use the fact that A is of bounded static variation to get better results. The idea is to embed p in a program that calls the Find function only on fully static arguments:
function Paths-to(G, A, B) = let nodes = Node-list(G) in
forall A1 2 nodes do
if A = A1 then Find(G, A1, B) function Find(G, A, B) = ...
Note that nodes and so A1 are static. The result of specializing to G, A could thus be a program of form
function Paths-to-Copenhagen(A) =
if A = Hamburg |
then [Hamburg, C1, ..., Copenhagen] |
else |
if A = London |
then [London, D1, ..., Copenhagen] |
else |
... |
|
else |
if A = Paris |
then [Paris, E1, ..., Copenhagen] |
else NOPATH |
in which all path traversal has been done at specialization time. In fact most partial evaluators, for example Similix, would share the list-building code, resulting in a specialized program with size proportional to the length of a shortest-path spanning tree beginning at A.
Sorting
Consider specializing a sorting algorithm with respect to the number of elements to be sorted. This can be pro table when sorting variable numbers of elements. One can use traditional methods, e.g. merge sort, until the number of elements to be sorted becomes less than, say, 100, at which point a specialized sorter is called. Figure 13.2 contains an example, run on Similix (syntax rewritten for clarity), resulting from specializing merge sort to n = 4.
The rst program version used a function merge to merge two lists. This had only dynamic arguments, so very little speedup resulted. To gain speedup, the lengths of the two lists were added as statically computable parameters, giving code like
292 Applications of Partial Evaluation
function merge-sort-4(A); A,B:array[0..3] of integer;
if A0 <= A1 then [B0:=A0; B1:=A1] else [B0:=A1; B1:=A0]; if A2 <= A3 then [B2:=A2; B3:=A3] else [B2:=A3; B3:=A2]; if B0 <= B2 then
[A0:=B0;
if B1<=B2 then [A1:=B1; A2:=B2; A3:=B3] else [A1:=B2; merge-4(B,A)]] else
[A0:=B2;
if B0<=B3 then [A1:=B0; merge-4(B,A)] else [A1:=B3; A2:=B0; A3:=B1]];
merge-sort-4:=A end
procedure merge-4(A,B);
if A1<=A3 then [B2:=A1; B3:=A3] else [B2:=A3; B3:=A1] end
Figure 13.2: Specialized merge sorting program.
procedure merge(A, Alength, B, Blength); merge :=
if Alength = 0 then B else if Blength = 0 then A else if first(A) < first(B) then
cons(first(A), merge(rest(A), Alength - 1, B, Blength) else cons(first(B), merge(A, Alength, rest(B), Blength - 1)
end
The length arguments of merge are static so all calls can be unfolded, resulting in essentially the specialized code seen in Figure 13.2.
The good news is that this program is between 3 and 4 times faster than the recursive version. The bad news is that specializing to successively larger values of n gives program size growing as O(n2), making the approach useless in practice.
What went wrong? In general the question of which of Alength or Blength is decreased depends on a dynamic test, so mix must account for all possible outcomes. Each length can range from 0 to n. There are O(n2) possible outcomes, so the specialized program will have size O(n2). (Its run time will still be of order n log(n) but with a smaller constant, so something has been gained.)
This problem is entirely due to non-obliviousness of the sorting algorithm. It leads directly to the question: does there exist a comparison-optimal weakly oblivious sorting algorithm?
Batcher's sorting algorithm [17] is both weakly oblivious and near optimal. It runs in time time O(n log2 n), and so yields specialized programs with size and
Exercises 293
speed O(n log2 n). Ajtai's sorting algorithm [5] is in principle still better, achieving the lower bound of n log(n). Unfortunately it is not usable in practice due to an enormous constant factor, yielding extremely large specialized sorters or sorting networks.
Interpreters
Interpreters are necessarily non-oblivious if the interpreted language contains tests, but we have seen that interpreters as a rule specialize quite well. This is at least partly because much experience has shown us how to write them so they give good results. Following are a few characteristics that seem important.
First, interpreters are usually written in a nearly compositional way, so the actions performed on a composite source language construction are a combination of the actions performed on its subconstructions. Compositionality is a key assumption for denotational semantics, where its main motivation is to make possible proofs based on structural induction over source language syntax.
From our viewpoint, compositionality implies that an interpreter manipulates only pieces of the original program. Since there is a xed number of these they can be used for function specialization.
In fact, compositionality may relaxed, as long as all static data is of bounded static variation, meaning that for any xed static interpreter program input, all variables classi ed as static can only take on nitely many values, thus guaranteeing termination. A typical example is the list of names appearing an environment binding source variables to values which for any given source program can grow, but not unboundedly in a language with static name binding (false for Lisp).
Interpreters written in other ways, for example ones which construct new bits of source code on the y, can be di cult or impossible to specialize with good speedup.
Second, well-written interpreters do not contain call duplication. An example problem concerns implementation of while E do Command. A poorly written interpreter might contain two calls to evaluate E or to perform Command, giving target code duplication (especially bad for nested while commands).
13.3Exercises
Exercise 13.1 Suggest three applications of partial evaluation to problems not discussed in this book.
2
Exercise 13.2 The following program p stores the smallest among Ai,. . . ,Aj in Ai, and the largest in Aj, assuming for simplicity that j , i + 1 is a power of 2. It uses 3n=2 , 2 comparisons, provably the least number of comparisons that is su cient to nd both minimum and maximum among n elements.
298 Termination of Partial Evaluation
There are several well-known tradeo s. Unfolding too liberally can cause in nite specialization-time loops, even without generating any residual code. Generating residual calls that are too specialized (i.e. contain too much static data) can lead to an in nitely large residual program; while the other extreme of generating too general residual calls can lead to little or no speedup.
A variety of heuristics have been devised to steer online call unfolding, beginning with the very rst partial evaluation articles. In nite unfolding cannot occur without recursion; so specializers often compare the sizes and/or structures of arguments encountered in a function or procedure call with those of its predecessors, and use the outcome to decide whether to unfold and, if not, how much to generalize the call. A variety of strategies, some rather sophisticated, have been described [19,75,112,158,178,230,235,267,269,281].
14.2Termination of o ine partial evaluators
In o ine partial evaluation, essentially the same decisions have to be taken, with the same tradeo s. A di erence is that this is preprocessing work, done by the BTA (binding-time analysis). BTA computes a division for all program variables on the basis of a division of the input variables. At specialization time this classi cation of variables (and thereby computations) as static or dynamic is blindly obeyed | so all-important decisions of when to specialize are encapsulated within the computed division.
In the literature on termination of o ine partial evaluation, including this chapter, emphasis is on the distinction between increasing and decreasing static variables [117,130,246]. (An exception is Holst's poor man's generalization, which generalizes all variables which do not have control decisions depending on them. This does not guarantee termination, but the heuristic might have some practical merit.)
14.2.1Problem formulation
How much freedom is there in the choice of division? An indispensable requirement is that it must be congruent: any variable that depends on a dynamic variable must itself be classi ed as dynamic. (Without this, code generation is impossible.) Further, some congruent divisions are bad in that they lead to in nite residual programs, as seen in the example of Section 4.4.5.
The division may also be used to make annotations indicating when it is safe to compress transitions (unfold) without causing code duplication or computation duplication. Usefulness is also practically relevant: if a variable is dead, i.e. if no computation depends on it, then it should be classi ed as dynamic. This principle was found to be crucial for specialization of larger, imperative programs in Chapter 4.
Termination of o ine partial evaluators 299
Since ensuring nite specialization is by far the hardest problem, we shall concentrate exclusively on it, and ignore the other problems. Recall from Chapter 4 that the specialized program will be nite if and only if its set of specialized program points poly is nite. We thus ignore questions of code generation and transition compression. Consequently we have reduced the problem to the following:
Given a division of the program inputs, nd a division of all variables that
1.is congruent;
2.is nite, so for all input data, the set poly of reachable specialized program points will be nite; and in which
3.as many variables as possible are classi ed as `static'.
A congruent nite division is an achievable goal, since classifying all variables as dynamic is congruent and will indeed ensure termination (a trivial solution that yields no specialization at all.) The main problem is thus to classify `just' enough variables as dynamic to ensure congruence and niteness. Point 3 ensures performance of a maximal amount of computation by the specializer, thus increasing residual program e ciency and avoiding the trivial solution when possible.
14.2.2Problem analysis
Given a program p, a division, and static program input vs0, the set poly of all reachable specialized program points was de ned in Chapter 4 to be the smallest set such that
(pp0, vs0) is in poly, where pp0 is p's initial program point; and
if (pp, vs) 2 poly, then successors((pp, vs)) is a subset of poly
where successors((pp, vs)) = f(pp1; vs0); . . . ; (ppn; vs0)g is the set of static parts of program points reachable in computations beginning at (pp, vs) and continuing to the end of the basic block begun by pp. Clearly, poly is nite if and only if all static variables assume only nitely many di erent values.
Bounded static variation
Certain source program variables can only assume nitely many di erent values. One example is a static program input that is never changed. Another is a variable that can change during execution, but always assumes values that are substructures of a static input. The idea can be formalized as follows.
The binding-time analysis algorithm in Section 4.4.6 constructs a division div which is congruent but not always nite. Let us say that a variable xk is of bounded static variation if (1) it is classi ed as static by div; and (2) for any static program input vs0, the following set is nite:
300 Termination of Partial Evaluation
fvk j (pp; (v1 . . . vk . . .vn)) 2 polyg
Our goal is thus a program analysis to construct a better division by recognizing certain variables as of bounded static variation, classifying them as `static', and classifying all other variables as `dynamic'.
Example 14.1 Consider the following program and assume that x is known to be static. How should y be classi ed?
y := 0; loop: x := x-1;
y := y+2;
if x=6 0 goto loop else exit; exit: ...
Classifying y as static violates neither congruence nor niteness as the assignment y:=y+2 is only performed n times if n is the initial value of x. The value of y throughout the computation is thus bounded by 2n. Observe that though for any one value for x there is a bound for y, there exists no uniform bound for y.
Things look di erent if x is dynamic: we lose the bound on the number of iterations. Thus y is unbounded, even though the binding-time analysis in Section 4.4.6 would call it static. Hence to comply with the niteness criterion, y should be clas-
si ed as dynamic. |
2 |
Finite downwards closure
How can we choose the division to make poly nite? A program analysis to recognize such properties as bounded static variation can be done using the fact that many value domains used in practice are are nitely downwards closed.
De nition 14.1 A set D with partial ordering < is nitely downwards closed i
8x 2 D : fy j y < xg is nite. |
2 |
A trivial consequence is that there exists no in nite descending chain, that is, a sequence v1; v2; v3; . . . with vi 2 D and vi > vi+1 for i 1.
Examples
The set of natural numbers N with the usual ordering < is nitely downwards closed.
The set of integers Z with the usual ordering < is not nitely downwards
closed since 8x 2 Z : fy j y < xg is in nite.
De ning x < y to mean x is a substructure of y, the set of nite trees isnitely downwards closed. So are the sets of nite lists and S-expressions, together with other common nite algebraic structures.