Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Jones N.D.Partial evaluation and automatic program generation.1999

.pdf
Скачиваний:
9
Добавлен:
23.08.2013
Размер:
1.75 Mб
Скачать

Part IV

Partial Evaluation in Practice

Chapter 12

Binding-Time Improvements

Two programs that are semantically, and even operationally, equivalent with respect to time or space usage may specialize very di erently, giving residual programs with large di erences in e ciency, size, or runtime memory usage. Thus partial evaluator users must employ a good programming style to make their programs specialize well. The purpose of this chapter is to give some hints about `good style', based on examples and experience.

Good style depends on the particular partial evaluator being used, but it is a common pattern that binding times should be mixed as little as possible: partially static items are harder to handle than fully static, a dynamic choice between static items is harder to use than a static choice, etc. A simple example (in SML):

fun f1 x y = (x+1)+y; fun f2 x y = (x+y)+1;

If x is static and y is dynamic, a partial evaluator will typically manage to reduce x+1 in f1 but be unable to reduce the body of f2. To do the latter, commutative and associative laws for addition must be applied, either during partial evaluation or in a prepass. A suitable prepass could, for example, transform function de nition f2 into f1, semantically equivalent but more amenable to partial evaluation.

A program transformation that preserves semantics but makes the program more suited for partial evaluation is called a binding-time improvement. Binding-time improvements are transformations applied, automatically or by hand, to a source program prior to the specialization phase. We do not, however, consider transformations such as car(cons E1 E2) ) E1 that may change program semantics (E2 could loop, assuming call-by-value).

For one example, we outlined in Section 4.9.3 how a simple program transformation could make a simple partial evaluator mimic the use of polyvariant divisions. This is a natural example of a binding-time improvement which is often applied manually, and occurs automatically in the Schism system [60].

Binding-time improvements are rapidly being automated and incorporated into

263

264 Binding-Time Improvements

systems (Similix and Schism in particular), so there are oating boundaries among hand improvements; automated improvements achieved by preprocessing; and automated improvements that are incorporated into the specialization algorithm itself. For instance, the introduction of a binding-time improvement prepass provides a very modular way to squeeze more power out of a simple specializer. Thus just which binding-time improvements can make programs run faster depends critically on the strength of the specializer and its BTA.

This complicates objective discussion, since the same example may specialize di erently on di erent systems. In order to clarify the problems involved and their solution, we assume the improvements are applied by hand, and that a very simple specializer is used.

Below we describe several strategies. All the examples presented are in Scheme and have been run with Similix1. When residual programs or fragments thereof are presented, these are shown exactly as they were produced by Similix.

The role of binding-time analysis

Binding-time improvements are of course relevant to both online and o ine specializers. Binding-time analysis is especially helpful for seeing global improvements, since the information it provides is visible (in the form of annotations) and can help determine where changes really do a di erence in binding times. One strategy could be to obtain good o ine binding-time separation with the aid of BTA, and then specialize the same program by online methods to get still better results.

12.1A case study: Knuth, Morris, Pratt string matching

This section shows how partial evaluation and binding-time improvements can generate Knuth, Morris, and Pratt's pattern matching algorithm [150] from a naive pattern matcher. The original version of this much-referenced example of bindingtime improvements is due to Consel and Danvy [54]. We present a simpler version which produces the same result.

The Scheme program of Figure 12.1 implements the rst (very) naive attempt. It takes a pattern p and a subject string d and returns yes if the pattern occurs inside the subject string, otherwise no. The variable pp is a copy of the original pattern and dd a copy of the rest of the string from the point where the current attempt to match started. Its time is O(m n), where m is the length of the pattern and n is the length of the subject string.

If the function kmp is specialized with respect to some static pattern p and dynamic string d, the result still takes time O(m n). A better result can be obtained by exploiting the information that when matching fails, the characters up to the mismatch point in d and p are identical. The trick is to collect this information, i.e. the common static pre x of p and d that is known at a given

1Version 5.0

A case study: Knuth, Morris, Pratt string matching 265

(define (kmp p d) (loop p d p d))

(define (loop p d pp dd) (cond

((null? p) 'yes) ((null? d) 'no)

((equal? (car p) (car d)) (loop (cdr p) (cdr d) pp dd)) (else (kmp pp (cdr dd)))))

Figure 12.1: Naive string matcher.

time, and to rst compare the pattern against this pre x, switching over to the rest of d only when the pre x is exhausted. Improvement is possible because the test against the pre x is static.

In the improved version of Figure 12.2, pre x ff is clearly of bounded static variation. The variable f plays the same role in relation to ff as d in relation to dd. The function snoc adds an element to the end of a list and is not shown.

(define (kmp p d) (loop p d p '() '()))

(define (loop p d pp f ff) (cond

((null? p) 'yes) ((null? f) (cond

((null? d) 'no)

((equal? (car p) (car d))

(loop (cdr p) (cdr d) pp '() (snoc ff (car p)))) ((null? ff)

(kmp pp (cdr d))) (else

(loop pp d pp (cdr ff) (cdr ff))))) ((equal? (car p) (car f))

(loop (cdr p) d pp (cdr f) ff)) (else

(loop pp d pp (cdr ff) (cdr ff)))))

Figure 12.2: String matcher good for specialization.

Because the character causing the mismatch is ignored we can expect some redundant tests in the residual program. These can be eliminated by a minor change in loop, making it possible to exploit this `negative' information as well, i.e. that a certain character is de nitely not equal to a known static value. Figure 12.3 shows a program that does this, where variable neg is a list of symbols that the

266 Binding-Time Improvements

(define (kmp p d)

(loop p d p '() '() '()))

(define (loop p d pp f ff neg) (cond

((null? p) 'yes) ((null? f) (cond

((and (not (null? neg)) (member (car p) neg)) (if (null? ff)

(kmp pp (cdr d))

(loop pp d pp (cdr ff) (cdr ff) neg))) ((and (null? neg) (null? d)) 'no)

((equal? (car p) (car d))

(loop (cdr p) (cdr d) pp '() (snoc ff (car p)) '())) ((null? ff)

(kmp pp (cdr d))) (else

(loop pp d pp (cdr ff) (cdr ff) (cons (car p) neg))))) ((equal? (car p) (car f))

(loop (cdr p) d pp (cdr f) ff neg)) (else

(loop pp d pp (cdr ff) (cdr ff) neg))))

Figure 12.3: Matcher with negative information.

rst symbol of d cannot match.

Figure 12.4 shows the result of specializing the matcher with negative information to p = (a b a b). The residual program is identical in structure to that yielded by Knuth, Morris, and Pratt's clever technique [150]. The complexity of the specialized algorithm is O(n), where n is the length of the string. The naive algorithm has complexity O(m n), where m is the length of the pattern. Perhaps counterintuitively, this speedup is considered linear since for each static m it is constantly faster than the naive algorithm (see Section 6.2).

This example is particularly interesting because a clever algorithm is generated automatically from a naive one using binding-time improvements and partial evaluation. This is thought-provoking even though such binding-time improvements may be hard to automate.

12.2Bounded static variation

A more easily systematized technique was applied in earlier chapters to bothow chart and Scheme0 programs (Sections 4.8.3 and 5.4.3 respectively). The

Bounded static variation 267

(define (kmp-0 d_0) (define (loop-0-1 d_0)

(cond ((null? d_0) 'no)

((equal? 'a (car d_0)) (loop-0-2 (cdr d_0))) (else (loop-0-1 (cdr d_0)))))

(define (loop-0-2 d_0) (cond ((null? d_0) 'no)

((equal? 'b (car d_0)) (let ((d_1 (cdr d_0)))

(cond ((null? d_1) 'no) ((equal? 'a (car d_1)) (let ((d_2 (cdr d_1)))

(cond ((null? d_2) 'no) ((equal? 'b (car d_2)) (begin (cdr d_2) 'yes)) (else (loop-0-5 d_2)))))

(else (loop-0-1 (cdr d_1)))))) (else (loop-0-5 d_0))))

(define (loop-0-5 d_0)

(if (equal? 'a (car d_0)) (loop-0-2 (cdr d_0)) (loop-0-1 (cdr d_0))))

(loop-0-1 d_0)))

Figure 12.4: Specialized matcher to pattern `abab'.

technique2 can be employed when a dynamic variable d is known to assume one of a nite set F of statically computable values. To see how it works, consider an expression context C[d] containing d. Assuming F has already been computed, the idea is to replace C[d] by

1.code that compares d with all the elements of F, certain to yield a successful match d = d1 2 F; followed by

2.code to apply the context C[ ] to static d1.

We shall see later (Section 12.3) that the same e ect can sometimes be realized by conversion to continuation passing style.

As an example of its use we show how a general regular expression matcher can be specialized with respect to a speci c regular expression to obtain a dedicated matcher in the form of a DFA (deterministic nite automaton). The example from Bondorf's PhD thesis [27] was developed by Mogensen, J rgensen, and Bondorf, and appears in Prolog in Section 9.1.

2So popular among partial evaluation users that it is sometimes called `The Trick'.

268 Binding-Time Improvements

Regular expressions are built up as in [4] from symbols, and the empty string " using concatenation, union and the Kleene star . For example, the regular expression a"(b jc) will generate the strings abb and ac, but not the string aa.

The programs below work on a concrete Scheme representation of regular expressions. The concrete representation is less readable than the abstract one, so in the descriptions of regular expression operations we use the abstract form. Symbol r denotes a regular expression in abstract syntax whereas r (used in program texts) denotes a regular expression in concrete syntax. The same distinction is made between sym and sym. Dually, any operation Op working on the abstract forms r or sym corresponds to a concrete operation Op working on the concrete forms r or sym. Op is used in the descriptions, Op in program texts.

The Scheme program below interprets regular expressions. It takes a regular expression r and a string s as input and returns the boolean true (#t) if the regular expression generates the string, otherwise false (#f).

(define (match r s) (if (null? s)

(generate-empty? r) (let ((sym (car s)))

(and (member sym (first r))

(match (next r sym) (cdr s))))))

We assume that certain functions are available without including their de nition: generate-empty?, first, and next. Function generate-empty? checks whether a regular expression generates the empty string. Function first computes the list of all symbols that appear as the rst of some string generated by the regular expression. Given a regular expression r and a symbol sym that is the rst in some generable string, (next r a0) computes a regular expression r1 which generates all strings a1a2...an such that r generates a0a1a2...an.

Some examples of the use of these functions (in the abstract notation):

generate-empty? a

=

true

generate-empty? (a j c)

=

false

rst ((a j ") (c d) )

=

fa,cg

next ((a j ") (c d) ) c

=

d (c d)

If we specialize the match program above with respect to some static regular expression r and dynamic string s, the resulting target program is not very good. The problem is that sym is a dynamic value because it is computed from s. Therefore, the regular expression (next r sym) cannot be computed at partial evaluation time, and all static information is `lost'.

We therefore wish to improve the binding times of match. Observe that sym is a member of the statically computable (first r). Applying `the trick', the program is rewritten as:

Bounded static variation 269

(define (match r s) (if (null? s)

(generate-empty? r) (let ((f (first r)))

(and (not (null? f)) (let ((sym (car s)))

(let loop ((f f)) (and (not (null? f))

(let ((A (car f))) (if (equal? A sym)

(match (next r A) (cdr s)) (loop (cdr f)))))))))))

Now the static A is used instead of the dynamic sym. We have sneaked in another very common improvement into this program. If one can statically determine not to perform some dynamic computation, this can be used to improve the size and speed of the residual program. If the list f is empty, there is clearly no need to perform the operation (car s), explaining the rst test of (not (null? f)) in match.

Let us take, as an example, the regular expression (abjbab) . When specializing the binding-time improved interpreter just given with respect to this regular expression, the target program of Figure 12.5 is generated by Similix.

(define (match-0 s_0) (define (match-0-1 s_0)

(if (null? s_0) #t

(let ((sym_1 (car s_0)))

(cond ((equal? 'a sym_1) (match-0-3 (cdr s_0))) ((equal? 'b sym_1) (let ((s_2 (cdr s_0)))

(and (not (null? s_2)) (equal? 'a (car s_2)) (match-0-3 (cdr s_2)))))

(else #f))))) (define (match-0-3 s_0)

(and (not (null? s_0)) (equal? 'b (car s_0)) (match-0-1 (cdr s_0))))

(match-0-1 s_0))

Figure 12.5: Specialized matching program.

There are no r variables in the target program since r was static and has vanished at partial evaluation time. The specialized versions of match correspond to di erent values of the static r. All operations on r have been reduced, so the target program contains no generate-empty?, first, or next operations.

270 Binding-Time Improvements

The target program corresponds exactly to a three-state deterministic nite3 automaton as derived by standard methods [4]. There are, however, only two procedures (besides the start procedure) in the target program, not three. This is because the procedure representing the state with only one in-going arrow is unfolded.

12.3Conversion into continuation passing style

A simple method that often improves a program's binding-time separation is to convert it into continuation passing style [219,68], introduced in Section 10.5 and from now on abbreviated to CPS. Although indiscriminate CPS conversion is to be avoided (see a counterexample below), it is more easily automated than many other binding-time improvements.

CPS conversion has the e ect of linearizing program execution, and can bring together parts of a computation that are far apart in its direct (non{continuation) form. Further, continuation style has practical advantages for functions that return multiple values, as these can be programmed without the need to package their results together before return, just to be unpackaged by the caller.

We mentioned earlier that the relevance of a given binding-time improvement depends on the partial evaluator being used. As explained in Section 10.5 Similix implicitly performs some CPS conversion during specialization (see Section 10.5), so not all the improvements discussed in this section are relevant for Similix.

Plotkin describes a general CPS transformation for lambda expressions, and Danvy and Filinski do it for Scheme [219,68]. Other work includes a simple rstorder CPS transformation for binding-time improvements by Holst and Gomard and a detailed discussion of CPS for binding-time improvements by Consel and Danvy [118,56].

A very simple example. Consider the expression

(+ 7 (if (= x 0) 9 13))

where x is dynamic. As the expression stands, the + operation is dynamic. Conversion into CPS yields:

(let ((k (lambda

(temp) (+ 7 temp))))

(if (= x 0)

(k 9) (k 13)))

Using Lambdamix notation, the continuation k has binding time S ! S and the + operation is now static. Partial evaluation yields the residual expression:

(if (= x 0) 16 20)

3Finiteness follows from the fact that any regular expression has only nitely many di erent `derivatives', where a derivative is either r itself, or an expression obtained from a given derivative r' by computing (next r' a).