This post is mirrored on my blog.
It has been a while since I felt the need to add features and make some more significant changes to Cakelisp.
Two came in this past month: defer
and CRC builds.
defer
The defer
feature is one I had been wanting for a while, but was
unsure about how I wanted to implement it cleanly.
Here's an example usage of defer
:
(defun main (&return int) (var file (* FILE) (fopen "File.txt", "rb")) (unless file (return 1)) (defer (fclose file)) ;; Do file operations... (return 0))
By putting defer
there, I am guaranteed to have the file closed if the
function ever returns. This removes the need to copy-paste
(fclose file)
before every return
, which can be very cumbersome. It
also makes the program more reliable, because I might forget to paste
the fclose
.
This feature can be found in many other new languages, including Zig and Go. It is a simple way to have some automatic actions without needing to add C++-style constructors and destructors, which can be quite complicated.
In Cakelisp, macros and compile-time code modification make it tricky to know when the code is the final state that will be compiled.
For defer
, I needed to know two things:
- The commands that should be deferred, which can be specified in many separate blocks
- Everywhere a scope exit occurs, so that the commands can be executed before exit
Scopes are sequences of code that will always be executed together. An
if
clause can have a scope executed if the condition is true, or
(optionally) one that should be executed when the condition is false.
Loop constructs like C's for
and while
enter and exit the loop body
scope on each iteration. Finally, functions themselves constitute a
function-body scope.
How Cakelisp code generation works
In Cakelisp, code generation happens through either a macro or a
generator. Macros output tokens. There are only four kinds of tokens:
strings, like "Hello, world!"
; symbols, like defun
; open
parenthesis; and close parenthesis. Cakelisp macros can run arbitrary
code, including custom validation, creating and setting compile-time
variables, etc. I have written about macros many times
here.
This extremely restricted world makes it simple to write the "evaluator": When the evaluator encounters an open parenthesis token, it expects the next token to be a symbol. If it isn't, it's a syntax error, otherwise, look up the symbol in the evaluator's known list of macros and generators, by name. If one is found, evaluate it immediately. If it isn't found, create a "reference" which we will hope to eventually resolve.
Generators output C or C++ code in the form of "string operations". These operations have various different flags such as "double quote" or "newline after", which are processed by the writer. The writer simply goes operation by operation, following its flags and outputting text into a file as requested.
How defer
was implemented
defer
consisted of three major parts.
First, the defer
statement itself was implemented as a generator. The
generator outputs the body of the defer into a splice.
Splices are special string operations that say, "output the array of string operations at this address". Splices accomplish a few things:
- They create "holes" that can be later filled. This is used by invocations where Cakelisp doesn't yet know whether you are trying to call a C function or a macro/generator that has not been defined yet. Cakelisp will generate everything in that state as if it were a C function call, then if the macro/generator is later defined, it will clear the splice's operations and replace it with the macro/generator output.
- They make it possible to change the output later. This enables code modification, which is when a function has already finished being generated, then a second pass is done at compile-time which rewrites that function with modifications. For example, GameLib has a compile-time function which rewrites every Cakelisp function to add performance profiling instrumentation.
- They create a place to stow code for other operations. This is how
defer
uses them.
The defer
generator outputs a single splice string operation with a
flag telling the writer that it should output the contents of that
splice on every scope exit.
Second, I needed to mark all the places where scopes enter and exit. I
was worried this would be complex, but it turned out simpler than I
expected. I had to audit all existing control flow generators (if
,
cond
, return
, break
, continue
, while
, for
, etc.) and mark up
their Open and Close operations as scope-entering and scope-exiting
operations. return
, break
, and continue
statements needed special
markings.
Third, the writer needed to have a stack of scopes as well as discovered
defer
splices. When a scope enter operation is encountered, it adds a
scope to the stack. When a defer
is encountered, it adds a pointer to
its splice to the current scope on the stack.
Scope exits are when the defer
statements need to be output. The
writer has three different ways to handle scope exits:
- If the exit is "natural", e.g. the end of an
if
true block is reached, the writer simply outputs alldefer
splices in the current scope before theif
block's closing bracket. - If the exit is from a
return
, the writer must output thedefer
splices for all scopes currently on the stack, becausereturn
exits all scopes. - If the exit is from a
continue
orbreak
, the writer outputs alldefer
splices on all scopes until it hits a "continue breakable scope", which is the start of afor
orwhile
.
Finally, the writer pops the most recently entered scope off the stack to finish the exit.
One subtle detail is that the writer always outputs separate defer
splices in reverse order within the scope. This ensures that the first
defer
is always the last to be executed, in case subsequent defers are
dependent on it.
defer
did make the writer more complex, but not significantly. I
implemented it in the writer because I didn't want to add an extra
evaluator stage; as implemented, defer
is very inexpensive in terms of
performance during compile-time.
It is limited in that there is no compile-time place where the user
could analyze the final code after defer
has been applied, then make
changes to it. This is because it happens in the writing stage, which is
after any compile-time code generation or modification can occur. I will
keep it implemented as is until I find I need to do that, in which case
it will need to be moved into an evaluator stage.
CRC builds
My work on distributed-automation was disturbed when I had problems with stale builds. I was trying to create an auto-update build for the distributed-automation worker on Windows, but the executable wasn't being updated.
Cakelisp used file modification times to decide whether an "artifact"
(an executable, object file, etc.) needed to be rebuilt. If the source
(a .c
file, header file, etc.) had a file modification time later than
the artifact, the artifact is out of date and must be rebuilt.
The problem was that my Windows clock wasn't the correct time--it had drifted into the future.[^1] When I ran a build, all the artifacts were marked as being built at that future time. Once I set the clock to the correct time, no artifacts would be built, because they were already marked as being more recently modified than their source.
This might be obvious to someone who has already written a build system. I knew it was an issue when I wrote the timestamp system, but I figured the clocks were reliable enough that it wouldn't matter.
Now, Cakelisp takes the CRC of every source and header file and records it in a cache. On next build, Cakelisp checks the source files against the recorded CRCs. If they do not match, the artifact is rebuilt. This is slower and more cumbersome than just checking modified times, but is absolutely necessary if the modification times cannot be trusted completely.
I now invalidate artifacts if the CRC is different or the source has a
newer timestamp. This lets the user e.g. touch
a file without changing
its contents to force a rebuild, for whatever reason. I may remove all
modification time code in the future, because it's not really providing
value past this.
Conclusion
Neither of these features are flashy, but defer
is a big
quality-of-life feature, and the CRC builds are an important fix for
what was an untrustworthy build system.
I don't have anything specific planned for Cakelisp in the near future. I am still following the strategy where I only implement things when I have a pressing need for them, so I can't say what I'll do next.
[^1]: The clock problem consistently happens because I dual-boot Windows and Linux on that machine. The two operating systems don't agree on how the hardware clock should keep time, so I must manually tell Windows to reset the clock to the network time after I've booted. I know I could solve this problem by configuring one or the other, but haven't gotten to it yet. It's good to have solved the problem with timestamps either way, because time in general shouldn't be relied on for this kind of system.