This is mirrored on my blog.

I have invested a significant amount of time in building Cakelisp for several reasons, but I think the main one was easier access and better integration of code generation.

In this article I'm going to give a high-level view of what I mean by "code generation" as well as a taste of how it can be used.

What is code generation?

By "code generation", I mean a piece of code which outputs more code.

A similar concept in another field might be "jigs". Jigs are essentially tools often built by the maker in the process of building another thing. An example of a jig in woodworking would be a table saw sled. These sleds assist in making various different cuts safer, easier, and faster on a table saw.

One satisfying aspect of jigs is that they are often custom built, and sometimes even built using the same machine they will be used on. Some are useless after the project is complete, and others are products built by a 3rd party and designed to be used for a wide variety of projects.

With code generation, the jig is a tool that helps you build another thing, the software. The core realization with code generation is that you can apply the same rigorous automation computers allow to the creation of the software itself.

How can it be used?

Before I worked at my previous job which relied heavily on code generation, I was unaware of how immensely useful it can be.

The simplest example is basic rote repetition. A computer can copy paste text with greater speed and reliability than a human can, so you really ought to rely on it as much as possible. If you find yourself copy-pasting a block of code that cannot be in a function, even a basic C preprocessor macro should be considered. They offer a chance to give the code a name, and naming should not be underappreciated.

The next step in code generation complexity would be code generation based on simple ordered data. The X macro is an example of this in C.

A more advanced example would be generating serialization code based on a schema or structure definition. I've written about this in The awesome power of type introspection, which often goes hand-in-hand with code generation.

A must-read article on coding style is Casey Muratori's Semantic Compression. The idea of semantic compression can be taken even further when using code generation, because you are not limited by the language's syntax constraints. You can write more compact forms, then let a code generator create the proper syntax for you.

Additionally, code generation can greatly improve the "ergonomics", or human ease in describing desired behavior in code. I called it interface friction. By using code generation, you can write code the way that makes the most sense to you, then have the computer create the repetitive and error-prone parts.

The idea of a domain-specific language is to create a language tailored to the problem. DSLs can be more compact, easier to read, and provide more context-specific errors than any general-purpose language could. I made a DSL to generate XML, for example.

There are many examples in the game industry of code generation used in production to great effect:

Unreal Engine uses it for UCLASS and various other features
Naughty Dog procedurally generates C++ header files from Racket code
I would write about the amazing extents code generation was taken at my last job, but damned NDAs make a public discussion of them too risky for me.

Why isn't it used more?

Paul Graham believes writing in Lisp gave his company an edge, and code generation via macros was a significant contributor:

The source code of the Viaweb editor was probably about 20-25% macros. Macros are harder to write than ordinary Lisp functions, and it's considered to be bad style to use them when they're not necessary. So every macro in that code is there because it has to be. What that means is that at least 20-25% of the code in this program is doing things that you can't easily do in any other language.

He does mention that writing macros is harder than writing "ordinary" functions. There is an upfront hurdle that must be leaped over in order to understand code generation, but I believe once it is crossed they are much the same as writing functions that manipulate any other data.

There is another possible cause for code generation skepticism. The "macro" got a bad reputation after extremely limited and difficult-to-debug C preprocessor macros impaired code quality. Taken too far, macros can be used to define completely alien languages atop the underlying language.[^1]

C-style macros are simple text pasting, whereas Lisp-style full-power macros are what I would consider true code generation. In Lisp, new code can be generated as easy as assembling data into a list, because Lisp has an environment which allows runtime code generation. In C, once the program is compiled the runtime does not typically create new code[^2].

One difference with these two is that the full-power macro code generation can include problem-specific input validation. Arguments can be sanity checked and custom error messages produced. Anyone who has worked with complex C++ template code is familiar with the deluge of template errors received by simple mistakes. A true hand-tuned code generator should have the ability to provide much more specific and helpful error messages.

The idea of code generation is not often explored in books and schools. Object-oriented programming and languages which typically do not sport code generation features are taught instead.

Conclusion

In practice code generation can become the most optimal way to solve some problems, like e.g. serialization in C. The "meta" nature of code generating code cannot be replicated with other techniques.

In Cakelisp, I provide facilities to generate arbitrary code at compile time. Cakelisp allows you to effectively leverage code generation without paying the high performance cost of a dynamic runtime.

[^1]: The most popular horror story is that of a lone engineer architecting a whole undocumented language in macros and building the business' foundations on it, then promptly leaving the company. The risk of this in practice I think is greatly over-exaggerated. An experienced engineer should know how much complexity is reasonable, and an inexperienced engineer won't likely have the skills or vision to implement complex code generation schemes. Not only that, but this same story can happen in languages without any code generation features! In other words, yes, bad programmers, including those who are too clever, can screw you.

[^2]: It is of course possible with things like just-in-time compilation, dynamic loading, and whatnot, but these are not a built-in part of the language.

The code generation X-factor

What is code generation?

How can it be used?

Why isn't it used more?

Conclusion

Comments