
Jeffery C.The implementation of Icon and Unicon.2004
.pdf
95
Note that asgn assigns the value of its second argument to j and overwrites Arg0 with a variable descriptor, which is left on the top of the stack.
Similarly, the virtual machine instructions for
z := x
are
pnull local 0 arg 0 asgn
the states of the stack are
The Stack after pnull

96
The Stack after local 0

97
The Stack after arg 0
The Stack after asgn

98
8.3 Operators
There is a virtual machine instruction for each of the fortytwo operators in Icon. The instructions random and asgn described previously are examples. Casting Icon operators as virtual machine instructions masks a considerable amount of complexity, since few Icon operators are simple. For example, although x + y appears to be a straightforward computation, it involves checking the types of x and y, converting them to numeric types if they are not already numeric, and terminating with an error message if this is not possible. If x and y are numeric or convertible to numeric, addition is performed. Even this is not simple, since the addition may be integer or floatingpoint, depending on the types of the arguments. For example, if x is an integer and y is a real number, the integer is converted to a real number. None of these computations is evident in the virtual machine instructions produced for this expression. which are
pnull local x local y plus
In the instructions given previously, the indices that are used to access identifiers have been replaced by the names of the identifiers, which are assumed to be local. This convention is followed in subsequent virtual machine instructions fo ease of reading.
Augmented assignment operations do not have separate virtual machine instructions. Instead, the instruction dup first pushes a null descriptor and then pushes a duplicate of the descriptor that was previously on top of the stack. For example, the virtual machine instructions for
i +:= 1
are
pnull local i dup
int 1 plus asgn
The stack after the execution of local is
The execution of dup produces

99
The dup instruction simply takes the place of the pnull and second local instructions in the virtual machine instructions for
i := i + 1
which are
pnull local i pnull local i int 1 plus asgn
In this case, only a single local instruction is avoided. If the variable to which the assignment is made is not just an identifier but, instead, a more complicated construction, as in
a[j] +:= 1
substantial computation may be saved by duplicating the result of the first argument expression instead of recomputing it.
8.2.4 Functions
While the meaning of an operation is fixed and can be translated into a specific virtual machine instruction, the meaning of a function call can change during program execution. The value of the function also can be computed. as in
(p[i])(x, y)
The general form of a call is
expr0(expr1, expr2, ..., exprn)
The corresponding virtual machine instructions are
code for expr0 code for expr1 code for expr2 code for exprn invoke n
The invoke instruction is relatively complicated, since the value of expr0 may be a procedure, an integer (for mutual evaluation), or even a value that is erroneous. Function invocation is discussed in detail in Chapter 10.
100
8.3 The Interpreter Proper
8.3. 1 The Interpreter Loop
The interpreter, which is called interp, is basically simple in structure. It maintains a location in the icode (ipc) and begins by fetching the instruction pointed to by ipc and incrementing ipc to the next location. It then branches to a section of code for processing the virtual machine instruction that it fetched. The interpreter loop is
for (;;) {
op = GetWord; switch (op) {
case Op_Asgn: case Op_Plus:
}
continue;
}
where GetWord is a macro that is defined to be (*ipc++).
Macros are used extensively in the interpreter to avoid repetitious coding and to make the interpreter easier to read. The coding is illustrated by the case clause for the instruction plus:
case Op_Plus: |
/* e1 + e2 */ |
Setup_Op(2); |
|
DerefArg(1); |
|
DerefArg(2); |
|
Call_Op; |
|
break; |
|
Setup_Op(n) sets up a pointer to the address of Arg0 on the interpreter stack. The resulting code is
rargp = (struct descrip *)(sp -1) -n;
The value of n is the number of arguments on the stack.
DerefArg(n) dereferences argument n. If it is a variable, it is replaced by its value. Thus, dereferencing is done in place by changing descriptors on the interpreter stack.
Call_Op calls the appropriate C function with a pointer to the interpreter stack as provided by Setup_Op(n). The function itself is obtained by looking up op in an array of pointers to functions. The code produced by Call_Op is
(*(optab(op]) )(rargp); Sp = (word * )rargp + 1:

101
Chapter 9: Expression Evaluation
PERSPECTIVE: The preceding chapter presents the essentials of the interpreter and expression evaluation as it might take place in a conventional programming language in which every expression produces exactly one result. For example, expressions such as
i := j
k := i + j i +:= ?k
each produce a single result: they can neither fail nor can they produce sequences of results.
The one feature of Icon that distinguishes it most clearly from other programming languages is the capacity of its expressionevaluation mechanism to produce no result at all or to produce more than one result. From this capability come unconventional methods of controlling program flow, novel control structures, and goaldirected evaluation.
The generality of this expressionevaluation mechanism alone sets Icon apart from other programming languages. While generators, in one form or another, exist in a number of programming languages, such as IPLV (Newell 1961), CLU (Liskov 1981), Alphard (Shaw 1981), and SETL (Dewar, Schonberg, and Schwartz 1981), such generators are limited to specific constructs, designated contexts, or restricted types of data. Languages with patternmatching facilities, such as SNOBOL4 (Griswold, Poage, and Polonsky 1971), InterLisp (Teitelman 1974), and Prolog (Clocksin and Mellish 1981), generate alternative matches, but only within pattern matching.
Just as Icon's expressionevaluation mechanism distinguishes it from other programming languages, it is also one of the most interesting and challenging aspects of Icon's implementation. Its applicability in every context and to all kinds of data has a pervasive effect on the implementation.
9.1 Bounded Expressions
A clear understanding of the semantics of expression evaluation in Icon is necessary to understand the implementation. One of the most important concepts of expression evaluation in Icon is that of a bounded expression, within which backtracking can take place. However, once a bounded expression has produced a result, it cannot be resumed for another result. For example, in
write(i = find(s1,s2))
find may produce a result and may be resumed to produce another result if the comparison fails. On the other hand, in
write(i = find(s1, s2)) write(j = find(s1, s3))
the two lines constitute separate expressions. Once the evaluation of the expression on the first line is complete, it cannot be resumed. Likewise, the evaluation of the expression on the second line is not affected by whether the expression on the first line succeeds or fails. However, if the two lines are joined by a conjunction operation, as in
write(i = find(s1, s2)) & write(i = find(s1, s3))
102
they are combined into a larger $ingle expression and the expression on the second line is not evaluated if the expression on the first line fails. Similarly, if the expression on the first line succeeds, but the expression on the second line fails, the expression on the first line is resumed.
The reason for the difference in the two cases is obscured by the fact that the Icon translator automatically inserts a semicolon at the end of a line on which an expression is complete and for which a new expression begins on the next line.
Consequently, the first example is equivalent to
write(i = find(s1, s2)); write(i = find(s1 , s3))
The difference between the semicolon and the conjunction operator is substantial. A semicolon bounds an expression, while an operator binds its operands into a single expression.
Bounded expressions are enclosed in ovals in the following examples to make the extent of backtracking clear. A compound expression, for example, has the following bounded expressions:
{ ED ED ...; exprn}
Note that exprn is not, of itself, a bounded expression. However, it may be part of a larger bounded expression. as in
({expr1: expr2; ...; exprn}=)
Here exprn is part of the bounded expression for the comparison operator. The entire enclosing bounded expression is a consequence of the final semicolon. In the absence of the context provided by this semicolon, the entire expression might be part of a larger enclosing bounded expression, and so on.
The separation of a procedure body into a number of bounded expressions, separated by semicolons (explicit or implicit) and other syntactic constructions, is very important. Otherwise, a procedure body would consist of a single expression, and failure of any component would propagate throughout the entire procedure body. Instead, control backtracking is limited in scope to abounded expression, as is the lifetime (and hence stack space) for temporary computations.
Bounded expressions are particularly important in control structures. For example, in the ifthenelse control structure, the control expression is bounded but the other expressions are not:
if expri then expr2 else expr3
As with the compound expression illustrated earlier, expr2 or exp13 (whichever is selected) may be the part of a larger bounded expression. An example is
write( if i < j then i to j else j to i )
If the control expression were not a separate bounded expression, the failure of expr2 or exp13 would result in backtracking into it and the ifthenelse expression would be equivalent to
(expr1 & expr2) | expr3
which is hardly what is meant by ifthenelse.
In a whiledo loop, the control expression and the expression in the do clause are both bounded:

103
while expri do expri
The two bounded expressions ensure that the expressions are evaluated independently of each other and any surrounding context. For example, if expr2 fails, there is no control backtracking into expr,
9.1.1 Expression Frames
In the implementation of Icon, the scope of backtracking is delineated by expression frames. The virtual machine instruction
mark L1
starts an expression frame. If the subsequent expression fails, ipc is set to the location in the icode that corresponds to L 1. The value of ipc for a label is relative to the location of the icode that is read in from the icode file. For simplicity in the description that follows, the value of ipc is referred to just by the name of the corresponding label.
The mark instruction pushes an expression frame marker onto the stack and sets the expression frame pointer, efp, to it. Thus, efp indicates the beginning of the current expression frame. There is also a generator frame pointer, gfp, which points to another kind of frame that is used to retain information when an expression suspends with a result and is capable of being resumed for another. Generator frames are described in Sec. 9.3. The mark instruction sets gfp to zero, indicating that there is no suspended generator in a new expression frame.
An expression frame marker consists of four words: the value ipc for the argument of mark (called the failure ipc), the previous efp, the previous gfp, and ilevel, which is related to suspended generators:
An expression frame marker is declared as a C structure:
struct ef_marker { |
/* expression frame marker */ |
|||
word *ef_failure; |
|
/* |
failure ipc */ |
|
struct |
ef_marker *ef_efp; |
/* |
efp */ |
|
struct |
gf_marker *ef_gfp; |
/* |
gfp */ |
|
word ef_ilevel; |
|
/* |
ilevel */ |
This structure is overlaid on the interpreter stack in order to reference its components. The code for the mark instruction is
case Op_Mark: /* create expression frame marker */ newefp = (struct ef_marker *)(sp + 1);
opnd = GetWord; opnd += (word)ipc;
newefp->ef_failure = (word *)opnd; newefp->ef_gfp = gfp; newefp->ef_efp = efp; newefp->ef_ilevel = ilevel;
104
sp += Wsizeof(*efp); efp = newefp;
gfp = 0; break;
The macro Wsizeof(x) produces the size of x in words.
An expression frame is removed by the virtual machine instruction
unmark
which restores the previous efp and gfp from the current expression frame marker and removes the current expression frame by setting sp to the word just above the frame marker.
The use of mark and unmark is illustrated by
if expr1 then expr2 else expr3
for which the virtual machine instructions are
mark L1
code for expr1 unmark
code for expr2 goto L2
L1:
code for expr3
L2:
The mark instruction creates an expression frame for the evaluation of expr1. If expr1 produces a result, the unmark instruction is evaluated, removing the expression frame for expr1, along with the result produced by expr1. Evaluation then proceeds in expr2.
If expr1 fails, control is transferred to the location in the icode corresponding to L 1 and the unmark instruction is not executed. In the absence of generators, failure also removes the current expression frame, as described in Sec. 9.2.
It is necessary to save the previous value of efp in a new expression marker, since expression frames may be nested. This occurs in interesting ways in some generative control structures, which are discussed in Sec. 9.4. Nested expression frames also occur as a result of evaluating compound expressions, such as
while expr1 do ifexpr2thenexpr2
9.2 Failure
The interesting aspects of implementing expression evaluation in Icon can be divided into two cases: without generators and with generators. The possibility of failure in the absence of generators is itself of interest, since it occurs in other programming languages, such as SNOBOL4. This section describes the handling of failure and assumes, for the moment, that there are no generators. The next section describes generators.
In the absence of generators, if failure occurs anywhere in an expression, the entire expression fails without any further evaluation. For example, in the expressions
i := numeric(s) line := read(f)
if numeric(s) fails in the first line, the assignment is not performed and evaluation continues immediately with the second line. In the implementation, this amounts to