Exploring the boundaries of outer space
Danny Yoo <[email protected]>
... well, except this isn’t always true! In particular, we might override or shadow a binding by setting up a new one:
(define (f x) (define (g) (define x 'objection!) ;; Overruled! x) ;; This is g's x, not f's. (g))
Within the body of f, the internal definition of g in f sets up a binding for x that blankets the one from f’s.
(define (f x) (define (g x) (define (h x) (... x ;; within here, x refers to h's x. ...)) ... x ;; within here, x refers to g's x. ...) ... x ;; within here, x refers to f's x. ...)
This document should have extensive hyperlinks that reference into the official documentation. Many of the words in typewriter font are hyperlinks. For example, “define” should be a hyperlink to the documentation for Racket’s binding form.
Please let me know if you have any suggestions or comments!
1 A brief macro tutorial
A common perception about Racket is that it’s an interpreter-based implementation, since it has a REPL and can dynamically evaluate programs. However, this perception is not quite correct.
Racket does use a compiler to translate programs into an internal bytecode format for optimization and ease of execution. However, Racket hides this compiler from most users because, under normal usage, Racket first quietly runs its compiler across a program, and then immediately executes the compiled in-memory bytecode. In fact, tools like raco make allow us to launch the compilation phase up front and save the bytecode to disk. If we use raco make, then program execution can pick up immediately from the on-disk bytecode.
One thing that makes Racket an interesting language is that it allows its users to hook expressions and functions into the compiler, so that these compile-time expressions get evaluated and called during the compilation phase. And unlike a purely textual pre-processor, these compile-time expressions can use the full power of Racket.
"date-at-compile-time.rkt"
#lang racket (require (for-syntax racket/date (planet dyoo/stardate))) (begin-for-syntax (printf "This program is being compiled at Stardate ~a\n" (date->stardate (current-date))))
If we run this from DrRacket, we may see somewhat more unusual output, because DrRacket can apply several program transformations that may cause "date-at-compile-time.rkt" to be compiled multiple times.
$ racket date-at-compile-time.rkt
This program is being compiled at Stardate 65741.5
$
This output supports the idea that, under normal circumstances, Racket interposes a compilation phase since it doesn’t see any stored bytecode on disk.
$ raco make date-at-compile-time.rkt
This program is being compiled at Stardate 65741.6
$
What is different is that the bytecode has been written to disk, under a "compiled" subdirectory. Now let’s try running the program with the bytecode having just been saved to disk:
$ racket date-at-compile-time.rkt
$
It looks like it’s not doing anything. That’s because it’s not doing anything.
The point is that our Racket programs can express both run-time and compile-time computations, and they run in distinct phases.
1.1 Macros are compile-time functions
For the gory details about Racket’s expansion process, see the reference manual.
#lang racket (begin-for-syntax ;; We can define a compile-time function: ;; ;; repeat-three: syntax -> syntax (define (repeat-three stx) (syntax-case stx () [(_ thing) (syntax (begin thing thing thing))]))) ;; and we can hook this compile-time function up to the macro expander: (define-syntax blahblahblah repeat-three) ;; Example: (blahblahblah (displayln "blah"))
Racket uses an abstract syntax tree structure called a syntax object to represent programs. It provides a variety of tools to manipulate these structured values. We can pattern-match and pull apart a syntax object with “syntax-case”, and create a new syntax object with “syntax”. The two forms cooperate with each other: when we pattern match a syntax-object with syntax-case, it exposes the components of the pattern so that they be referenced by syntax.
(define-syntax (blahblahblah stx) (syntax-case stx () [(_ thing) #'(begin thing thing thing)]))
1.2 Syntax objects are more than s-expressions
; Turn on line/column counting for all new ports: > (port-count-lines-enabled #t)
; Read a syntax object:
> (define a-stx (read-syntax #f (open-input-string "(Racket is my favorite language on the Citadel)")))
; And inspect the individual syntax objects in the structure:
> (for ([piece (syntax->list a-stx)]) (printf "~a at line ~a, column ~a, position ~a, span ~a\n" piece (syntax-line piece) (syntax-column piece) (syntax-position piece) (syntax-span piece)))
#<syntax:1:1 Racket> at line 1, column 1, position 2, span 6
#<syntax:1:8 is> at line 1, column 8, position 9, span 2
#<syntax:1:11 my> at line 1, column 11, position 12, span 2
#<syntax:1:14 favorite> at line 1, column 14, position 15, span 8
#<syntax:1:23 language> at line 1, column 23, position 24, span 8
#<syntax:1:32 on> at line 1, column 32, position 33, span 2
#<syntax:1:35 the> at line 1, column 35, position 36, span 3
#<syntax:1:39 Citadel> at line 1, column 39, position 40, span 7
More importantly, syntax objects hold lexical information, a key element that allows programs to bind and refer to variables. At the beginning of compilation, the program’s syntax object has little lexical information. As the expander walks through the syntax object, though, it can encounter forms that introduce new bindings. When the expander encounters define, it enriches the lexical information of the syntax objects in scope.
(define (cow x) (string-append "moooo?" x))
(probe-1 (define (cow x) (probe-2 (string-append "moooo?" x))))
First, let’s define the initial probe-1 macro:
> (define-syntax (probe-1 stx) (syntax-case stx () [(_ (d (f i) (p2 (op rand-1 rand-2)))) (begin (printf "at probe-1: ~a's binding is ~a\n" #'rand-2 (identifier-binding #'rand-2)) #'(d (f i) (p2 (op rand-1 rand-2))))]))
It will tell us what the binding of x looks like in the body of the function; the expander does a top-down walk over the structure of the syntax object, so x shouldn’t report any lexical information at this point.
> (define-syntax (probe-2 stx) (syntax-case stx () [(_ (op rand-1 rand-2)) (begin (printf "at probe-2: ~a's binding is ~a\n" #'rand-2 (identifier-binding #'rand-2)) #'(op rand-1 rand-2))]))
> (probe-1 (define (cow x) (probe-2 (string-append "moooo?" x))))
at probe-1: #<syntax:7:0 x>'s binding is #f
at probe-2: #<syntax:7:0 x>'s binding is lexical
As we can see, the expansion process enriches the syntax objects in the definition of cow; probe-2 shows us that, at the point where the expander reaches probe-2, x knows it is lexically bound.
1.3 Moooo?
Lexical information isn’t just stored in a symbolic syntax object like x, but rather it’s present in every syntax object. To demonstrate this, we can make a probe-3 that’s bit more disruptive to cow: it will take the "moooo?" out of the cow and put something else in its place.
We’ll use a combination of two tools to perform this surgery: datum->syntax and with-syntax. datum->syntax lets us create syntax objects with arbitrary lexical information, and with-syntax acts like a let that allows us to inject syntax objects with syntax. Just like syntax-case, with-syntax cooperates with syntax to make it easy to construct new syntaxes.
> (define-syntax (probe-3 stx) (syntax-case stx () [(_ (op rand-1 rand-2)) (with-syntax ([new-rand-1 (datum->syntax #'rand-1 '(string-append x x))]) #'(op new-rand-1 rand-2))]))
> (define (cow x) (probe-3 (string-append "moooo?" x)))
> (cow "blah") "blahblahblah"
The use of datum->syntax here takes the lexical information from "moooo?", and pushes it into a fresh syntax object that we construct from '(string-append x x).
And now our cow has been transmogrified into something... familiar, yet unnatural. How unfortunate.
It’s instructive to see what happens if we neglect to preserve the lexical information when we create syntax objects with datum->syntax. What happens if we just put #f in there?
> (define-syntax (probe-4 stx) (syntax-case stx () [(_ (op rand-1 rand-2)) (with-syntax ([new-rand-1 (datum->syntax #f '(string-append x x))]) #'(op new-rand-1 rand-2))]))
> (define (cow x) (probe-4 (string-append "moooo?" x))) compile: unbound identifier (and no #%app syntax
transformer is bound) at: string-append in: (string-append
x x)
Poor cow. What’s important to see is that '(string-append x x) has no inherent meaning: it depends on what we mean by string-append and x, and that is precisely what lexical information is: it associates meaning to meaningless symbols.
Now that we’re finished probing cow, let’s go back and see how to define outer in the remaining space we have.
2 Defining def
We now have a better idea of how macros and lexical scope works. Syntax objects accumulate lexical information through the actions of the expander. Now let’s break lexical scope.
To qualify: we’d like to define an outer form that lets us break lexical scoping in a controlled fashion: we’ll allow outer to poke holes along scope boundaries. Let’s say that the boundaries will be at the outskirts of a function definition. In fact, let’s make these boundaries explicit, by introducing our own def form. It will behave similarly to define.
#lang racket (define-syntax (def stx) (syntax-case stx () [(_ (name args ...) body ...) #'(define (name args ...) body ...)]))
def gives us a function definition syntax. Let’s try it.
> (def (f x) (* x x))
> (f 3) 9
We want to amend def so that it stores the syntax object representing the function as a whole. We want this information to be accessible to other macros that expand when the body of the function is compiled. That way, when we’re in an outer, we might take that stored syntax object and use it as the source of lexical information in constructing a new syntax, as we did with probe-3.
We might use syntax-parameterize, except that if we do so, we interfere with how define needs to be used in a context that permits definitions.
#lang racket (require racket/stxparam ;; syntax parameters are defined in racket/splicing) ;; racket/stxparam and ;; racket/splicing ;; Let's make a compile-time parameter called current-def that ;; remembers the innermost def that's currently being compiled. (define-syntax-parameter current-def #f) (define-syntax (def stx) (syntax-case stx () [(_ (name args ...) body ...) (with-syntax ([fun-stx stx]) #'(splicing-syntax-parameterize ([current-def #'fun-stx]) (define (name args ...) body ...)))]))
2.1 The outer limits
In production code, we’d probably use the replace-context function from the syntax/strip-context library instead.
(define-syntax (outer stx) (syntax-case stx () [(_ id) (datum->syntax (syntax-parameter-value #'current-def) (syntax-e #'id) stx)]))
> (def (f x) (def (g x) (* (outer x) x)) (g 4))
> (f 2) 8
Hurrah!
2.2 Timing is everything
(define-syntax (bad-def stx) (syntax-case stx () [(_ (name args ...) body ...) (with-syntax ([fun-stx stx]) #'(define (name args ...) (splicing-syntax-parameterize ([current-def #'fun-stx]) body ...)))]))
then we end up placing the splicing-parameterize accidently in the scope of the define. This wouldn’t be so bad, except for the case that, when Racket processes the define, the expander enriches the syntax objects within the function body with lexical scoping information for its arguments.
In particular, it enriches the syntax object that we’re intending to assign to the current-def parameter later on. Ooops. So we need to take care to keep the splicing-syntax-parameterize outside of the function’s body, or else our pristine source of outside scope will get muddied.
3 Acknowledgements and thanks
This tutorial arose from my confusion on how macro expansion works. Special thanks to Robby Findler, Matthew Flatt, and Ryan Culpepper for helping resolve my mental confusion about lexical enrichment.