Sometimes we want to write stereotyped code, not just stereotyped data structures. As with data, we sometimes want part of our stereotyped piece of code to vary. We can do this with syntax extensions, also known as macros.
(If you're familiar with macros from C, don't scoff. Macros in C are stunningly lame and hard to use compared to Lisp or Scheme macros. Read on to find out what you've been missing. If you're familiar with Lisp macros, but have never done advanced programming with them, you probably don't realize how powerful they are--Lisp macros are so error-prone that people often avoid them. Scheme macros are very powerful, but automate away some of the tricky parts.)
Macros are syntax extensions to a programming language, expressed as a translation of expressions. By writing a macro, what you're really doing is extending the functionality of the compiler or interpreter--you're telling it how to compile (or interpret) a new construct, by telling it how to rewrite that construct into something it already knows how to compile or interpret.
(Conceptually, defining a macro is extending the compiler--you're telling the parser how to recognize a new construct, to change the grammar of the language, and also specifying how to generate code for the new construct. This is something you can't do in most languages, but it's easy in Scheme.)
Scheme recognizes macro definitions, and then uses them to recognize and translate the new constructs into other constructs. The interpreter or compiler's process of translating a level constructs is often called "macro expansion," despite the fact that the resulting expression may not be bigger than the original expression. Macroexpansion can be recursive, because macros can use macros, and a macro can even use itself, like a recursive procedure.
Syntax extension is powerful, and hence somewhat dangerous when used too casually. Be aware that when you write a macro, you can change the syntax of your programming language, and that can be a bad thing--you and others may no longer be able to easily understand what the program does. Used judiciously, however, such syntactic extensions are often just what you need to simplify your programs. They are especially useful for writing programs that write programs, so that you can avoid a lot of tedious repetitive coding.
Macros are so useful that they're usually used in the implementation of Scheme itself. Most Scheme compilers actually understand only a few special forms, and the rest are written as macros.
In a later chapter, I'll describe some advanced uses of macros, which let your "roll your own" language with powerful new features.
Why do we want macros? In Scheme, the main code abstraction mechanism
is procedural abstraction, e.g. using define
or lambda
to
write procedures that do stereotyped things. In a sense, we
"specialize" procedures by passing them argument values--a procedure
can do different things depending on the values it's given to work
with. We can also "specialize" procedures by creating closures
in different environments. Isn't this enough?
Not in general. While procedural abstraction is very powerful, there are times when we may want to write stereotyped routines that can't be written as procedures.
Suppose, for example, you have a Scheme system which gives you things
like let
and if
, but not or
. (Real Schemes all
provide or
, but pretend they don't. It makes a nice, simple
example.)
You want an or
construct (rather like the one actually built
into Scheme). This or
can take two arguments; it evaluates
the first one and returns the result if it's a true value, otherwise it
evaluates the second one and returns that result.
Notice that you can't write or
as a procedure. If or
were a procedure, both of its arguments would always be evaluated before
the actual procedure call. Since or
is only supposed to evaluate
its second argument if the first one returns #f
, it just wouldn't
work.
If Scheme didn't have or
, you could fake it at any given point
in your program, by writing an equivalent let
expression with
an if
statement in it.
For example, suppose you wanted to write the equivalent of
(or (foo?) (bar?))
.
As a first try, you might do this:
(if (foo?) (foo?) (bar?))
That is, test (foo?)
, and return its value if it's a true value.
That's not really quite right though, because this if statement
evaluates foo?
twice: once to test it, and once to return it.
We really only want to evaluate it once--if (foo?)
is an
expression with side effects, evaluating it twice could make the program
incorrect as well as inefficient.
To avoid this, we can always evaluate the first expression just once to get the value, and store that value in a temporary variable so that we can return it without evaluating it again:
You could instead write
(let ((temp (foo?))) (if temp temp (bar?)))
This let
expression gives the same effect as
(or (foo?) (bar?))
, because it evaluates foo
exactly once,
and then tests the value; if the value is true, it returns that value.
(The use of a let
variable to stash the value allows us to test
it without evaluating (foo?)
again.) If the value is #f
,
it evaluates (bar?)
and returns the result.
This is the transformation we'd like to be able to automate by
defining or
as a macro.
Here's a simple version of or
written as a macro. I've called
it or2
to distinguish it from Scheme's normal or
.
(define-syntax or2 (syntax-rules () ((or2 a b) ; pattern (let ((temp a)) ; template (if temp temp b)))))
What we're saying to Scheme is that we want to define the syntax of
or
by giving syntax rules for recognizing it and translating
it. For this simple version of or
, we only need one rule, which says
to translate (or
a b)
into the desired let
expression.
(or a b)
is called the rule's pattern, which specifies
what counts as a call to the or
macro, and the let
expression is the rule's template, which specifies
the equivalent code that Scheme should translate calls into.
The variables a
and b
are called pattern variables.
They stand for the actual expressions passed as arguments to the
macro. They are "matched" to the actual expressions when the
pattern is recognized, and when the template is interpreted or compiled,
the actual expressions are used where the pattern variables occur.
You can think of this in two steps. When a macro is used,
(It's really not quite this simple, but that's the basic idea.)
In some ways, macro arguments are a lot like procedure arguments, but in other ways they're very different. The pattern variables are not bound at run time, and don't refer to storage locations. They're only used in translating a macro call into the equivalent expression.
Always remember that arguments to a macro are expressions used in
transforming the code, and then the code is executed. (For example,
the output of the or
macro doesn't contain a variable
named a
; a
is just a shorthand for whatever expression
is passed as an argument to the macro. In the example use
(or (foo?) (bar?))
, the expression (foo?)
is
what gets evaluated at the point where a
is used in the
macro body.)
This is why our macro has to use a temporary variable, like the
hand-written equivalent of or
. If we tried to write
the macro like a procedure, without using a temporary variable,
like this
(define-syntax or (syntax-rules () ((or a b) (if a a b))))
then (or (foo?) (bar?))
would be translated into
(if (foo?) (foo?) (bar?))
As with the buggy handwritten version, (foo?)
would be evaluated
twice when this expression was evaluated.
(This is the most common mistake in writing macros--forgetting that while macros give you the ability to control when argument expressions are evaluated, they also require you to control it. It's safe to use a procedure argument multiple times, because that's just referring to a value in a run-time binding. Using a macro argument causes evaluation of the entire argument expression at that point.)
We can make a better or
by using more rules. We might want or
to work with any number of arguments, so that
or
of zero arguments returns #f
, because it has zero true
arguments,
or
of one argument is equivalent to that argument--it's true
if and only if that argument is true.
or
of two or more arguments evaluates its first argument, and
returns its value if it's true. Otherwise, it computes the
or
of the rest of its arguments, and returns its
result.
Here's the Scheme definition with these three rules:
(define-syntax or (syntax-rules () ((or) ; OR of zero arguments #f) ; is always false ((or a) ; OR of one argument a) ; is equivalent to the argument expression ((or a b c ...) ; OR of two or more arguments (let ((temp a)) ; is the first or the OR of the rest (if temp temp (or b c ...))))))
Notice that this definition is recursive. (The third rule's template
uses the or
macro recursively.) If we hand or
four
arguments, like this: (or foo bar baz quux)
, it should
be equivalent to (or foo (or bar (or baz (or quux))))
.
Scheme will use recursion to translate the expression, one step at a time. When Scheme encounters a macro call, it transforms the call into the equivalent code, using the appropriate rule. It then interprets (or compiles) the resulting expression. If the result itself includes a macro call, then the interpreter (or compiler) calls itself recursively to translate that before evaluating it. For a correctly written macro, the recursive translation will eventually "bottom out" when no more macro calls result, and the code will be evaluated in the usual way.
(As I'll show later, this fits in very neatly with the interpreter's or compiler's recursive evaluation of expressions.)
This recursion is recursion in Scheme's transformation of the call expression into equivalent code--it doesn't mean that the resulting code is recursive. A Scheme compiler will do all of the recursive transformation at compile time, so there's no runtime overhead. Of course, the recursion has to terminate, or the compiler will not be able to finish the translation.
In this definition of or
, the third rule contains the symbols
c ...
. The Scheme identifier ...
is treated specially,
to help you write recursive rules. (In previous examples, I used
...
as an ellipsis to stand for code I didn't want to write out,
but here we're usuing the actual Scheme identifier ...
;
it's actually used in the Scheme code for macros.)
Scheme treats a pattern variable followed by ...
as matching
zero or more
subexpressions. In this or
macro, c ...
matches all of the arguments after the first two.
Scheme matches (or foo bar baz quux)
by the third rule,
whose pattern (or a b c ...)
, because it has at least two
arguments. In applying the rule, Scheme matches a
to foo
,
b
to bar
, and c ...
to the sequence of
expressions baz bleen
.
[ This is similar to how you use unquote-splicing
inside
backquote--you can splice a list into a list at the same
level, rather than nesting it. ]
The result of processing this or
is
(let ((temp foo)) (if temp temp (or bar baz quux)))
Now Scheme evaluates this expression.
But there's another or
in there--when Scheme gets to
(or bar baz quux)
it will match the third rule again,
with a
matched to bar
, b
matched to baz
,
and c ...
being matched to just quux
.
The result of this macro-processing step is
(let ((temp foo)) (if temp temp (let ((temp bar)) (if temp temp (or baz quux)))))
And the new let expression is evaluated.
There or
is again, so Scheme will treat (or baz quux)
the same way, again using the third rule--this time matching a
to
baz
, b
to quux
, and c ...
to nothing at all,
producing
(let ((temp foo)) (if temp temp (let ((temp bar)) (if temp temp (let ((temp baz) (if temp temp (or quux))))))))
And this will be evaluated.
Now the resulting or
matches the second rule in the or
macro, because it has only one argument quux
, which is matched to
a
. The whole translation is therefore:
(let ((temp foo)) (if temp temp (let ((temp bar)) (if temp temp (let ((temp baz) (if temp temp quux)))))))
There are no more macro calls here, so the recursion terminates.
You might have noticed that the example translation of
(or foo bar baz quux)
has several different variables
named temp
in it. You might have wondered if this
could cause problems--is there a potential for accidentally
referring to the wrong variable in the wrong place in the
code generated by a macro?
The answer no. Scheme's macro system actually does some magic to
avoid this, which I'll discuss later in a later section. Scheme
actually keeps track of which variables are introduced by different
applications of macros, and keeps them distinct--the different
variables named temp
are treated as though they had
different names, so that macros follow the same scope rules as
procedures. (Scheme macros are said to be hygienic;
what that really means is that they respect lexical scope.)
You can think of this as a renaming, as though Scheme had sneakily changed the names each time the macro was applied to transform the expression, and the result were
(let ((temp_1 foo)) (if temp_1 temp_1 (let ((temp_2 bar)) (if temp_2 temp_2 (let ((temp_3 baz) (if temp_3 temp_3 quux)))))))
Scheme implements the same scoping rule for macros and their arguments as for procedures and their arguments. When you call a procedure, the argument expressions are evaluated at the call site, i.e., in the call site environment, and the values are passed to the procedure--the environment inside the called procedure doesn't affect the meaning of the argument expressions. Likewise
In writing macros like or
, we want to control when
and whether the arguments are evaluated, but otherwise
we want them to mean the same thing they would if they were
arguments to a procedure.
For example, suppose we call or
with an argument expression
that happens to use a name that's used inside or
. or
uses a local variable named temp
, and we might just happen
to pass it an expression using the name temp
.
Consider the following procedure, which uses local variables perm
and temp
, and calls or
in their scope.
(define (employee? person) (let ((perm (member person permanent-employees)) (temp (member person temporary-employees))) (or perm temp))
If we translated the or macro naively, without worrying about accidental naming conflicts, we'd get this:
(define (employee? person) (let ((perm (member person permanent-employees)) (temp (member person temporary-employees))) (let ((temp perm) (if temp temp temp)))
(This is not what R5RS Scheme macros do!)
Note what's wrong here. The name temp
was passed into the
macro from the call site, but it appeared in the body of the macro
inside the let
binding of temp
. At the call site,
it referred to the "outer" temp
, but inside the macro,
it turne out to refer to something else--in the process of moving
the expression around, we accidentally changed its meaning.
As examples of Scheme macros, I'll show how to implement several
special forms in terms of lambda
. This is how most real
Scheme compilers work--the compiler itself only "understands"
how to compile a few special forms directly, but the others can
be defined as macros.
Traditionally, the compiler understands
lambda
, and all other binding forms are implemented in
terms of lambda
and procedure calling. The compiler must
also understand a few other special forms, if
, set!
,
quote
, a simple version of define
[ did I leave
one out? ].)
let
Recall that in chapter [ whatever ], I explained how the semantics
of let
can be explained in terms of lambda
. For
any let
expression, which binds variables and evaluates
body expressions in that scope, there is an exactly equivalent
expression using lambda
and procedure calling. The lambda
creates a procedure which will bind the variables as its argument
variables, and execute the body of the let
. This lambda
is then used in a combination--calling the procedure makes it
bind variables when it accepts arguments.
(define-syntax let () (syntax-rules ((_ ((var value-expr) ...) body-expr ...) ; pattern ((lambda (var ...) body-expr ...) (value-expr ...)))))
Here I've used an underscore to stand for the keyword let
in the
macro call pattern. This is allowable, and recommended, because it
avoids having to write the keyword in several places. (If you had
to write out the keyword in each pattern, it would make it more difficult
and error-prone to change the name of a macro.)
I've also taken advantage of the fact that Scheme is pretty smart about
patterns using the ...
(ellipsis) symbol. The pattern has
two ellipses. One matches any number of binding forms (variable names
and initial value expressions); the other matches any number of body
expressions.
The body expressions matched by body-expr ...
are simply
used in the body of the lambda
expression.
The expressions matched by (var value-expr) ...
are used differently,
however--they are not simply substituted into the macro template.
Instead, (var ...)
is used to generate the argument list
for the lambda
, and value-expr ...
is used to generate
the list of initial expressions.
Scheme's macro system is smart enough to figure out what's going on.
If the pattern contains an ellipsis following a compound form
(like (var init-expr) ...
, and the template uses one of the
pattern variables from that compound form (followed by an ellipsis),
then Scheme assumes you want the corresponding part of
each form matched by the pattern form.
If we think of the expressions as s-expressions, we've matched a
pattern that is one list of two-element lists, and restructured
it into two separate lists of elements. (That is, we're going from a
list of car
s and cadr
s to a list of car
s
and a list of cadr
s.)
As an example of use, consider
(let ((var-a (some-procedure foo)) (var-b (some-procedure bar))) (quux var-a) (quux var-b))
which translates to
((lambda (var-a var-b) (quux var-a) (quux var-b)) (some-procedure foo) (some-procedure bar))
[ The following is out of place--here I should just be showing some
uses of macros. The problem is that I don't want to lie and pretend
that it's all very simple--Scheme does something sophisticated when
you write binding contstructs as macros...
This stuff will all be clearer after I've talked about hygiene
problems with Lisp macros, and laziness and call-by-name... how to
fwd ref gracefully? ]
An extraordinarily astute and thoughtful reader might wonder if there's
something wrong here. (Luckily, there's actually nothing to worry about.)
Recall that when discussing or
, I said
that Scheme is careful to treat names introduced by a macro as though
they were distinct, effectively renaming variables introduced in a macro.
What about the argument variables to lambda
in this example?
One might think var-a
and var-b
would just be renamed and
we'd get:
((lambda (var-a-1 var-b-1) (quux var-a) (quux var-b)) (some-procedure foo) (some-procedure bar))
Clearly, this isn't what we want--we want var-a
and
var-b
in the lambda
body to refer to the variables introduced
in by lambda
---that's what it's for.
Scheme's macro processor is smart enough to infer that this is what
you want. When you write a macro that accepts a name as an
argument and binds it, Scheme assumes you're doing that for a good
reason. If you then take another argument to the same macro and
use it in the scope of that new variable, Scheme assumes you want
occurrences of the name to refer to the new
variable.
That is, Scheme uses an algorithm that checks what you do
with names in forms that get passed as arguments into a macro. If
you just use them in the normal ways, evaluating or assigning to
them as variable names, Scheme assumes you mean to refer to
whatever those names refer to at the call site of the macro.
(That's normal lexical scope.) But if you take the name and
use it as the name of a new variable
, Scheme assumes
you're defining a binding construct, and that any other
arguments you put in that scope should see the new binding,
instead of being scoped at the call site.)
Scheme can generally assume this, because if you're not implementing
a scoping binding form (like let
or do
), there's no
reason for a macro to accept a name as an argument and then turn
around and bind it.
let*
Once we have let
, we can implement let*
in terms
of that. We simply write a recursive macro that peels off
one binding form at a time and generates a let
, so that
we get a nested set of let
s that each bind one variable.
(define-syntax let* () (syntax_rules ((_ () body-expr ...) (begin body-expr ...)) ((_ ((var1 value-expr1)(var value-expr) ...) (let ((var1 value-expr)) (_ ((var value-expr) ...) body-expr ...)))))
This macro uses two syntax rules. The first is the termination
case for recursive macroexpansion. A let*
that has an empty
binding form (i.e., binds zero variables) should be translated into
a begin
; it will just sequence the body
expressions.
The recursive rule says that a let*
with one or more binding
subforms should translate into a let
that performs the first
binding and another let*
to bind the rest and evaluate the
body. (Note that I've used the _
shorthand for let*
in the recursive call, as well as in the pattern.)
As with let
, Scheme recognizes this as a binding construct,
and does the right thing--it notices that the var
argument
passed into the macro is used as a name of a new binding in the
macro, so it assumes that the new binding should be visible to
the body expressions.
cond
Scheme macros also have several features I haven't demonstrated,
to make it easier to write more sophisticated macros than or
,
and I'll demonstrate those later, too.
In the next section, though, I will discuss a different and simpler kind of macro system, which is not standard Scheme, and does have problems with variable names.
In this section, I'll discuss a simple kind of macro system that isn't standard in Scheme (and you might be able to skim this section without too much loss) but is interesting for several reasons.
quasiquote
to do most of our work for us. Then I'll show how to implement
quasiquote
, too.)
The classic macro system is the Lisp macro system, which allows
the user to define an arbitrary Lisp procedure to rewrite a
new construct. (Most dialects of Lisp, e.g., Common Lisp, have
a macro facility of the same general kind, called defmacro
.)
We'll talk for a moment about a simplified version of Lisp-style
macros. Later we'll explain why and how Scheme macros are better,
at least for most purposes.
Suppose we have a macro system that we can use to tell the interpreter or compiler that when it sees an expression that's a list starting with a particular symbol, it should call a particular routine to rewrite that expression, and use the rewritten version in its place.
For the or
example, we want to tell the compiler that if it
sees an expression of the form (or
a b)
it should rewrite that into an expression of the form
(let ((temp a) (if temp temp b))
So now we want to tell the compiler how to rewrite expressions
like that. Since Lisp expressions are represented as lists, we
can use normal list operations to examine the expression and
generate the new expression. Let's assume our system has
a special form called define-rewriter
that lets us specify
a procedure of one argument to write a particular kind of
expression.
Here's a rather ugly rewriter macro for or
:
; OR with subtle scope bug (define-rewriter 'or ; tell compiler how to rewrite (or ...) (lambda (expr) (let ((a (cadr expr)) (b (caddr expr))) (cons 'let ; make LET form (cons (list (list 'temp a))) ; make let binding form (append '(if temp temp) ; make IF form (list b))
There's actually a scoping problem with this macro, which I'll ignore for now--it's the problem that define-syntax fixes. Later, I'll show what's wrong and fix it, but for a while I just want to talk about basic syntax of Lisp-style macros.
Now when the interpreter or compiler is about to evaluate an
expression represented as a list, it will check to see if it starts
with or
. If so, it will pass the expression to the above
rewriter procedure, and interpret or compile the resulting list
instead.
(Actually, macroexpansion doesn't have to happen just before interpreting or compiling a particular expression. The system might rewrite all of the macro calls in a whole procedure, or a whole program, before feeding the procedure or program to the normal part of the compiler. It's easier to understand macros if they're interleaved with expression evaluation or compilation, though--it's just an extra case in the main dispatch of your interpreter or compiler.)
Implementing define-rewriter
is easy. (We'll show an
implementation for our example interpreter in a later section.)
We only need to do two simple things:
(
symbol ...)
begin
with the name of a rewriter macro, and if so, to call the
rewriter to transform the expression before interpreting
(or compiling) it.
That's all.
The above system works, but it has several awkwardnesses. One
is that it is tedious to write routines that construct s-expressions
directly. We can use quasiquote
to make this easier. It
will allow us to simply write the s-expression we want the macro
to produce, and use unquote to fill in the parts we get from
the arguments to the macro.
; OR with subtle scope bug (define-rewriter 'or ; tell compiler how to rewrite (or ...) (lambda (expr) (let ((a (cadr expr)) (b (caddr expr))) `(let ((temp ,a)) ; return an s-expression of this form (if temp temp ,b))
This is much easier to read. The backquoted expression is now readable as code--it tells us the general structure of the code produced by the macro, and the commas indicate the parts that vary depending on the arguments passed to the macro.
Note that there is no magic here: define-rewriter
and quasiquotation
can be used independently, and are very different things. It just happens
that quasiquotation is often very useful for the things you want to
do in macros--returning an s-expression of a particular form.
This simple rewriting system is still rather tedious to use, for several
of reasons. First, we always have to quote the name of the special
form we're defining. Second, it's tedious to write a lambda
every time. Third, it's tedious to always have to destructure the
expression we're rewriting to get the parts we want to put into
the expression we generate. ("Destructure" means take apart
to get at the components--in this case, subexpressions.)
It would be nice if the macro facility allowed you to declare the
argument pattern to the macro, and automatically destructured it for you.
Most Lisp systems have a special form called defmacro
that does
this for you, and avoids the need to write a lambda
expression
every time. For consistency with Scheme naming conventions,
we'll call our equivalent define-macro
.
define-macro
implicitly creates a transformation procedure whose
body is the body of the define-macro form. It also implicitly destructures
the expression to be transformed, and passes the subexpressions to the
transformation procedure.
Using define-macro
, we can write or
this way, specifying
that or
takes two arguments:
; OR with subtle scope bug (define-macro (or a b) `(let ((temp ,a)) (if temp temp ,b))
We didn't have to write the code that destructures the form into
a
and b
---define-macro
did that for us. We also
didn't have to explicitly write a lambda
to generate the
transformation procedure; define-macro
did that too.
This makes the syntax of define-macro
similar to a procedure-defining
define
form. Still, you should always remember that you're
not writing a normal procedure: you're writing a procedure to transform
code before it is interpreted or compiled. The combination of automatic
argument destructuring and template-filling (using backwuote and comma)
makes it easier to do this in many cases.
Like a procedure, a macro can take a variable number of arguments,
with the non-required ones automatically packaged up into a rest list.
We can define a variable-arity or
with define-macro
:
[ need to check this example--it's off the top of my head ]
; variable arity OR with subtle scope bug (define-macro (or . args) (if (null? args) ; zero arg or? #f (if (null? (cdr? arg-exprs)) ; one arg or? (car arg-exprs) `(let ((temp ,(car arg-exprs))) (if temp temp (or ,@(cdr arg-exprs)))))))
Here we're just accepting the list of argument expressions to the
or
expression as the rest list args
.
If it's an empty list, we return #f
. Keep in mind that we're
returning the #f
object, which will be used in place of the or
expression, i.e. as the literal #f
to use in the resulting code.
(Conceptually, it's a fragment of a program code, even though that program
fragment will in fact return the value #f when it executes, because #f is
self-evaluating. We could have quoted it to make that clearer.)
If it's a one-element list, we just return the code (s-expression) for the first argument.
If it's a list of more than one argument expression, we return
an s-expression for the let
with a nested if
. (Note
the use of unquote-splicing
(,@
) to splice the cdr
of the expression list into the or
form as its whole list
of arguments.)
You should be aware, though, that what you're really doing is specifying a procedure for transforming expressions before they're compiled or interpreted, and that quasiquote is just syntactic sugar for procedural code that constructs an s-expression.
define-macro
is easy to write, once we've got define-rewriter
;
we don't have to modify the interpreter or compiler at all.
We just use define-rewriter
to write define-macro
as
a simple macro. We'll make define-macro
a macro that generates
transformation procedures, and uses define-rewriter
to register
them with the interpreter.
Earlier we alluded to a lurking bug in our define-rewriter
and define-macro
definitions for or
.
Suppose we use the or
macro this way--we check to see if someone
is employed as either a permanent or temporary employee, and generate
a w2 tax form if either of those is true.
(let ((temp (member x temporary-employees)) (perm (member x permanent-employees))) (if (or temp perm) (generate-w2 x)))
The expansion of this is:
(let ((temp (member x temporary-employees)) (perm (member x permanent-employees))) (if (let ((temp temp)) (if temp temp temp))) ;BUG! (should refer to outer temp, not inner) (generate-w2 x)))
The problem here is that we happened to use the same name, temp
, at
both the call site and inside the macro definition. The reference to
temp
in (or temp perm)
gets "captured" by the binding
of temp introduced in the macro.
This occurs because a normal macro facility does not understand issues
of name binding--the name temp
refers to one program variable
at the call site, and another at the site of its use inside the macro--and
the macroexpander doesn't know the difference. To the macroexpansion
mechanism, the symbol temp
is just a symbol object, not a name of
anything in particular, i.e., a particular program variable.
There are two ways to get around this problem. One is for the
macro-writer to be very careful to use names that are very unlikely
to conflict with other names. This makes code very ugly, because
of the unnatural names given to variables, but more importantly, it's
harder to get right than it may seem. The other way around the
problem is to get a much smarter macro facility, like the new Scheme
define-syntax
macro system.
One way to avoid name conflicts is to pick names for variables used inside macros in such a way that they're unlikely to conflict with names that users of the macros might pick, e.g.,
(define-macro (or first-arg second-arg) `(let ((temp!in!or!macro ,first-arg) (if temp!in!or!macro temp!in!or!macro ,second-arg)))
It's unlikely that anyone will name a different variable
temp!in!or!macro
someplace else, so the problem is solved,
right? Not necessarily.
Besides the fact that this is incredibly tacky, there's still
a situation where this kind of solution is likely to fail--when
people nest calls to the same macro. Each nested call will
use the same name for different variables, and things can go
nuts. (Food for thought: is this true of the or
macro above?
Does it nest properly?)
The standard hack around that problem is to have each use of the
macro use a different name for each local variable that might get
captured. This requires some extra machinery from the underlying
system--there has to be a procedure that generates new, unique
symbols, and which can be called by the macro code each time the
macro is expanded. The traditional Lisp name for this procedure
is gensym
, but we'll call it generate-symbol
for
clarity.
Now we can write a fixed version of the two-argument OR
macro.
; Version of 2-arg OR with scope bug fixed (define-macro (or first-arg second-arg) (let ((temp-name (generate-symbol))) `(let ((,temp-name ,first-arg) (if ,temp-name ,temp-name ,second-arg)))
Notice that the outer let
is outside the backquote--it will be
executed when the macro is used (i.e., once each time an or
expression is rewritten; the quasiquoted part is the code to be interpreted
or compiled (after the comma'd holes are filled in).
Each time a call to or
is processed by the compiler (or interpreter),
this let
will generate a new symbol before translating it;
quasiquote will fill in the holes for the new symbol. (Be sure to get
your metalevels right here: temp-name
is the name of a variable in
the macro transformation procedure, whose binding will hold a pointer
to the the actual name symbol that will be used for the variable.)
Isn't this ugly? To some degree, Lisp macros are nice because you can use the same language (Lisp) in macros as you can in normal code. But due to these funky scoping issues, you effectively end up having to learn a new language--one with lots of generate-symbol calls and commas.
On the other hand, maybe it builds character and abstract reasoning abilities, because you have to think a lot about names of names and things like that. Fun, maybe, but not for the faint of heart.
quasiquote
and unquote
[ This section is particularly rough and needs to be reworked. Sorry. ]
quasiquote
is a special form that (like quote
) has a very
special sugared syntax. Part of this syntax is recognized by the reader,
rather than the compiler or interpreter proper; the rest of the work is
done by the compiler or interpreter.
quasiquote
A backquote character is interpreted very specially by the Scheme (or
Lisp) reader, and backquoted expressions are converted into quasiquote
expressions with a normal-looking nested-prefix-expression syntax. The
expression `(a b c)
is actually just shorthand for
(quasiquote (a b c))
Similarly, comma'd expressions are converted, e.g. `(a ,b ,c) is
read in as (quasiquote (a (unquote b) (unquote c)))
. Notice that as
far as the reader is concerned, these are just lists--it is up to
the compiler or interpreter to process them further, and the reader
just preprocesses them into lists that the compiler or interpreter can
deal with.
quasiquote
The quasiquote
special form may be built into the
compiler or interpreter, but it can be implemented as a macro, in Scheme.
That's the easy way to do it, and it's what we'll do.
I'll show a simplified version of quasiquote
that only deals with
commas at the top level of a list, e.g.,
(quasiquote (foo ,bar (baz x y)))
but not
(quasiquote (foo ,bar (baz ,x y)))
Notice that (quasiquote (foo ,bar (baz x y)))
should expand to
something like
(list 'foo bar '(baz x y))
We'll actually generate an expression that uses cons
instead of
list
, because we want to write quasiquote
recursively;
if its argument is a list, it will peel one element at a time of off
the list of arguments, and either quote it or not before using it in
the resulting expression that is the rewritten version of the
macro call.
Given this strategy, (quasiquote (foo ,bar (baz x y)))
should
expand to
(cons 'foo (cons bar (cons '(baz x y)) '()))
Notice that what we've done is generate an expression to generate
a list whose components are explicitly quoted where necessary,
as opposed to the original backquoted list where things are quoted
by default and explicitly unquoted.
And since '
thing is just a shorthand for
(quote
thing)
, we'll really generate an ugly expression
like
(cons (quote foo) (cons bar (cons (quote baz x y) '())))
written as a straighforward low-level macro. We'll define it as a trivial
macro that just calls a procedure quasiquote1
to do the actual
transformation.
[ NEED TO DEBUG THIS... PRW ]
(define-macro (quasiquote expr) (quasiquote1 expr))
(define (quasiquote1 expr) (if (not (pair? expr)) ; if quoted expr is not a list, it's just expr ; a literal ; else we'll grab a subexpression and cons it (appropriately ; quoted or not) onto the result of recursively quasiquoting ; the remaining arguments (let ((first-subexpr (car expr)) (rest-subexprs (cdr expr))) (if (and (pair? next-subexpr) (eq? (car first-subexpr) 'unquote))) (list 'cons first-subexpr ; gen expr to eval this subexpr (quasiquote1 rest-subexprs)) (list 'cons (list 'quote first-subexpr) ; quote this subexpr (quasiquote1 rest-subexprs)))))
A full implementation of quasiquote is a little trickier, because it
must deal with nested uses of quasiquote
and unquote
;
each subexpression that is not unquoted must be traversed and treated
similarly to the top-level list--i.e., rather than just using the
subexpressions as literals and quoting them, an equivalent expression
should be constructed to create a similarly-structured list with the
unquoted holes filled in. Also, a full implementation should handle
unquote-splicing
as well as unquote
.
define-rewriter
In Chapter [ whatever ], I showed the code for an interpretive evaluator that was designed to support macros. In this section, I'll explain how to implement the macro processor and install it in the interpreter.
Recall that when eval
encounters an expression that's represented
as a list, it must determine whether the list represents a
combination (procedure call), a built-in special form, or a
macro call. It calls eval-list
to do this dispatching.
Also recall that we implemented environments that can hold different kinds of bindings--of normal variables or macros. A macro binding holds a transformation procedure that can be used to rewrite an expression before it is interpreted.
eval-list
checks to see if the list begins with a symbol,
which might be the name of a macro, or the name of a procedure.
It looks in the environment to find the current binding of
the symbol.
If it's a syntax (macro) binding, eval-list
it extracts the
transfromer procedure from the binding information, and calls
eval-macro-call
to evaluate the list expression.
Here's eval-macro-call
:
(define (eval-macro-call transformer expr envt) (eval (transformer expr) envt))
All it does is apply the transformation procedure to the expression,
and call eval
recursively to evaluate the result.
This is sufficient to be able to use macros, once they're defined. We also need to be able to define macros, so that we can use them.
For that, we'll add one special form to our interpreter,
define-rewriter
, which takes a name symbol and a transformation
procedure as its arguments.
[ Show define-rewriter
... has to accept a closure in our
language, not the underlying Scheme ]
define-macro
Once we've added define-rewriter to our interpreter, we don't have to
modify the interpreter at all to add define-macro
. We can
simply define it as a macro in the languauge we interpret, using
define-rewriter
"from the inside." We had to add
define-rewriter
to the language implementation itself, but
once that's done, we can bootstrap a better macro system with
no extra help from the interpreter.
define-macro
does three things:
Bear in mind that the following code is not code in the interpreter,
but code to be interpreted, to create a define-macro
macro, from
inside our language.
[ show define-macro
... pattern matching on arg form and creating
a routine to destructure and bind... ]