This article is mirrored on my blog.
I'm writing a linker. It's an unusual linker. It's focus is not on producing executable files. Instead, its focus is to facilitate rapid iteration on a program without having to re-link or re-open it after making changes. It is hot-loading of code at an object-file granularity.
I realized from reading Andreas Fredriksson's Hot Runtime Linking (archive.org) that the dynamism I've been wanting in a ahead-of-time compiled environment could be achieved by taking over the linking and loading stages. I made an attempt to support hot-reloading in Cakelisp, but it was fragile and limited. Not only was working at the link/load stage a better fit, but it also meant I could hot-reload more than just code written in Cakelisp.
The goal of this linker is to bring a dynamic environment to compiled languages. I'm calling it a "linker/loader" because its purpose isn't to create an executable like most linkers---its purpose is to link, load, and execute code, and allow doing that on a continuously running process.
By keeping the program running while making changes to it, I believe it will change how you think about the program. If iteration times are extremely low, you are more willing to try experiments. Over long development periods the time saved will add up, and the low friction to making changes will result in a better product.
The world without this linker
Hot-reloading in C family languages is typically done via dynamic
loading. On Windows, the dynamic loading happens via
.dll files, while GNU/Linux uses
libdl to load
object") files. However, this approach has many limitations:
- You must structure your project differently such that the reloadable parts are in dynamic libraries. Your build system and potentially your file organization get more complex.
- Dynamic library variables aren't persisted across loads.[^1]
- Memory management across DLL boundaries can cause issues on Windows. You may need to refactor your code which previously wasn't split into DLLs.
- Different operating systems implement dynamic loading differently.
On Windows, you even have to annotate all your code with the
__declspec(dllexport), which is painful if you are used to Unix-style implicit exposure.
Another alternative is just-in-time (JIT) compilation. The primary qualms I have with JIT systems are:
- They typically require limited interfaces to JITed code. This means you cannot change any code in the project, only e.g. the "scripts" part of it. It is difficult to get the boundary right between code which can and cannot be dynamically loaded. This is the same issue with using embedded dynamic scripting languages like Lua or Python---how much of your application should be written in them?
- JIT compilation requires generating machine code, which is usually a complex process and a large maintenance burden. In practice this means shipping out to a 3rd party library for JIT, and the libraries are typically very large dependencies.[^2]
What you will be able to do with this linker
The current interface I have in mind is as follows:
- Compile your project however you want, so long as you produce a bunch of object files or object file archives (and eventually, dynamic libraries).[^3]
- Rather than using GNU
link.exe, etc. to link your objects into an executable, call
linker-loader [your list of objects...].
linker-loaderwill then do one of the following:
- On the first invocation, it will link and load the objects
immediately, then execute the entry-point (
mainor something you decide). This means your program will start running without having ever been compiled into a single executable.
- On subsequent invocations, if the program is still running[^4],
the linker will only re-link and load object files that have
changed. The new invocation of
linker-loaderwill see that the program is still running and instead tell the existing
linker-loaderprocess to reload changed objects. This organization alone should save you time because the whole program no longer needs to be re-linked every time you make a change.
linker-loaderwill do its best to retain all state across reloads without any special markup on your part. If you make changes it cannot resolve, it will need to prompt you with how you would like to proceed. For example, if the data changes size but you are fine with the old data simply being discarded, it will be able to handle that.
- On the first invocation, it will link and load the objects immediately, then execute the entry-point (
The interface shouldn't require much up-front work to use with an existing project. Additional work would be necessary to allow the project to self-modify or introspect on its own image, however.
Inspirations and similar projects
I had many different ideas that resulted in my starting this project:
- Andreas Fredriksson's Hot Runtime Linking (archive.org) is where I got the idea to make a linker and loader specifically.
- Naughty Dog's Game Oriented Assembly Lisp (GOAL) was a compiled language with a dynamic environment. It was built for the Playstation 2 in 2001. If Andy Gavin could build this functionality then, why can't we have it today?
- Malleable Systems got me motivated to pursue more "malleability" in my programs. I'm not sure if this project would be the right way to ship this kind of functionality to users, but it's something I'll keep in mind while working on it.
- My experience with Emacs helped me realize how immensely valuable easy program customization can be. I'm surprised how far ahead Emacs still is on this front, despite being one of the oldest open source projects in history.
- Stephen Kell's Liberating the Smalltalk lurking in C and Unix talk opened my mind to the possibility of bringing more dynamism and introspection to C.
The following projects are similar to mine, though do not take exactly the same approach:
- Runtime Compiled C++ uses marked-up C++ to learn about your code, then compiles and loads it via DLL. See How it works.
- Live++ "works at the binary level using .PDB, .EXE, .DLL, .LIB, and .OBJ directly. It extracts and reverses most of the needed information from executable and object files." This sounds the most similar to my approach. The two major drawbacks to this project to me are A) no GNU/Linux support, which is where I develop my software, and B) it is closed source and proprietary, whereas I'm a believer in Free (as in freedom) Software.
- Visual Studio Edit and Continue is intended to let you live edit any code in your project and magically apply the edit. However, I have never gotten it to work, and none of my coworkers have either. The rumor among us is that it is not well suppported, especially not on large projects like games (which are what I work on professionally).
This linker can facilitate program introspection. I plan on having symbols the linker itself provides to the program image that allow the program to inspect its own symbols. This opens the door to a whole variety of interesting things:
- Call any function in your program in an interactive read-evaluate-print loop
- Visualize function compiled sizes
- Visualize function references[^5]
- Introspect on program data
- ...and more things I haven't thought of yet!
Things I'm still figuring out
I haven't yet touched the debugging aspect of this. I want certain features in my linker/loader which will necessitate my program image being unique from a normally linked executable. That means I will need to do something custom to help debuggers find the debug symbols from wherever my loader has decided to place the executing code in memory. I've only glimpsed at the DWARF debugging info, and it's pretty complicated.
The intent with this linker/loader was primarily to aid during development, so I have been focused on supporting my primary development architecture, x86-64 (a.k.a. AMD64). Linkers are machine architecture-dependent, so each architecture would need to be added one-by-one once support for them is desired. This doesn't mean your program would only work on x86-64; it could support a superset of the architectures my linker supports, and you would need to use a different linker to create executables for other architectures.
With my software, I do all initial development on GNU/Linux, then port to Windows after I have proven to myself that the concept is valuable. This means I have not done any work towards Windows Portable Executable or Common Object File Formats. If I find I can do the things I want on GNU/Linux (which uses ELF format executables), I will port the linker to Windows.
There are complexities around the data sections of the program image that I need to figure out. For example, you should be able to change functions as much as you want while still persisting data across reloads. However, if you change the presence or size of items in data, the linker will need to do some work to try to persist data which hasn't been affected. This will likely require some help from the debug symbols to determine where things are in data and guess at whether they have changed since last load. I need to do more experimentation before I can find the limitations of this system, but ideally, you can change data without needing to restart the program in many cases.[^6]
Get in touch!
Let me know what you think by emailing me: macoy [at] macoy [dot] me.
You can see the current code here. As of publishing this article, it can load an ELF format object file for x86-64, process the file's relocations, and call into the object file correctly. It's not near release; I'll write a new blog post once that happens.
[^1]: I solved this in Cakelisp by automatically converting static variables to heap allocate instead, but it's a dirty solution and fragile.
[^2]: An example of a JIT library for C is libgccjit (GCC-based). You could also build one based on LLVM. Both GCC and LLVM are enormous dependencies by my standards. Tiny C Compiler would be an example of a small library, but still a complex dependency.
[^3]: In theory any compiled language which produces object files should automatically work with this linker. In practice, I think there are going to be incompatibilities with some languages which would need to be supported case-by-case. Anything which does link-time code generation, for example, would not work with this system until support is added.
[^4]: There is some additional complications here. My first pass would
be to require the program to itself return control to the
linker-loader so that the program's image can be safely edited.
This means the program would need some code to recognize that it has
been requested to reload and return all the way up the stack on all
threads so that its code can be modified. Eventually once I learn
more I may be able to do something better. Note that in this
condition the program doesn't need to close its window or free all
its state to be reloaded, it only needs to not be executing any code
in sections which are going to be reloaded. It can then pick back up
where it left off with all the same data.
[^5]: This is limited by compiler optimizations. For example, a
module-local function (a function marked
static in C) will be
relatively referenced by the compiler and no record of the reference
will reach the linker.
[^6]: One clear exception would be changing the size or structure of data which is referenced by other data. The linker would need a user-written migration function to know how to handle that, which is likely not worth writing when compared to the time required to re-launch the process.