PREV UP next SCM

6.2.12: Evaluation

SCM uses its type representations to speed evaluation. All of the subr types (see Subr Cells) are tc7 types. Since the tc7 field is in the low order bit position of the CAR it can be retrieved and dispatched on quickly by dereferencing the SCM pointer pointing to it and masking the result.

All the SCM Special Forms get translated to immediate symbols (isym) the first time they are encountered by the interpreter (ceval). The representation of these immediate symbols is engineered to occupy the same bits as tc7. All the isyms occur only in the CAR of lists.

If the CAR of a expression to evaluate is not immediate, then it may be a symbol. If so, the first time it is encountered it will be converted to an immediate type ILOC or GLOC (see Immediates). The codes for ILOC and GLOC lower 7 bits distinguish them from all the other types we have discussed.

Once it has determined that the expression to evaluate is not immediate, ceval need only retrieve and dispatch on the low order 7 bits of the CAR of that cell, regardless of whether that cell is a closure, header, or subr, or a cons containing ILOC or GLOC.

In order to be able to convert a SCM symbol pointer to an immediate ILOC or GLOC, the evaluator must be holding the pointer to the list in which that symbol pointer occurs. Turning this requirement to an advantage, ceval does not recursively call itself to evaluate symbols in lists; It instead calls the macro EVALCAR. EVALCAR does symbol lookup and memoization for symbols, retrieval of values for ILOCs and GLOCs, returns other immediates, and otherwise recursively calls itself with the CAR of the list.

ceval inlines evaluation (using EVALCAR) of almost all procedure call arguments. When ceval needs to evaluate a list of more than length 3, the procedure eval_args is called. So ceval can be said to have one level lookahead. The avoidance of recursive invocations of ceval for the most common cases (special forms and procedure calls) results in faster execution. The speed of the interpreter is currently limited on most machines by interpreter size, probably having to do with its cache footprint. In order to keep the size down, certain EVALCAR calls which don't need to be fast (because they rarely occur or because they are part of expensive operations) are instead calls to the C function evalcar.

There was some discussion a year ago about a "Forth" style Scheme interpreter. This is the only improvement I know of which would beat SCM in speed.

Provided there is still type code space available in SCM, if we devote some of the IMCAR codes to "inlined" operations, we should get a significant performance boost. What is eliminated is the having to look up a GLOC or ILOC and then dispatch on the subr type. The IMCAR operation would be dispatched to directly. Another way to view this is that we make available special form versions of CAR, CDR, etc. Since the actual operation code is localized in the interpreter, it is much easier than uncompilation and then recompilation to handle (trace car); For instance a switch gets set which tells the interpreter to instead always look up the values of the associated symbols.
Variable: symhash
Top level symbol values are stored in the symhash table. symhash is an array of lists of ISYMs and pairs of symbols and values.
Immediate: ILOC
Whenever a symbol's value is found in the local environment the pointer to the symbol in the code is replaced with an immediate object (ILOC) which specifies how many environment frames down and how far in to go for the value. When this immediate object is subsequently encountered, the value can be retrieved quickly.
Immediate: GLOC
Pointers to symbols not defined in local environments are changed to one plus the value cell address in symhash. This incremented pointer is called a GLOC. The low order bit is normally reserved for GCmark; But, since references to variables in the code always occur in the CAR position and the GCmark is in the CDR, there is no conflict.

If the compile FLAG CAUTIOUS is #defined then the number of arguments is always checked for application of closures. If the compile FLAG RECKLESS is #defined then they are not checked. Otherwise, number of argument checks for closures are made only when the function position (whose value is the closure) of a combination is not an ILOC or GLOC. When the function position of a combination is a symbol it will be checked only the first time it is evaluated because it will then be replaced with an ILOC or GLOC.

Macro: EVAL expression env
Macro: SIDEVAL expression env
EVAL Returns the result of evaluating expression in env. SIDEVAL evaluates expression in env when the value of the expression is not used.

Both of these macros alter the list structure of expression as it is memoized and hence should be used only when it is known that expression will not be referenced again. The C function eval is safe from this problem.

Function: SCM eval (SCM expression)
Returns the result of evaluating expression in the top-level environment. eval copies expression so that memoization does not modify expression.