This article is mirrored on my blog.
I recently released File Helper, a file organization application I wrote using Cakelisp.
This application had only two external files that were necessary for it to fully function:
- A font
- An application icon
I packaged File Helper in a .zip
or .tar.gz
for Windows or Linux
respectively. These archives contain the platform executable as well as
a license file and the two necessary font and icon files.
However, wouldn't it be nice if instead I shipped a single executable, thereby eliminating the extract step?
It might sound trivial, but eliminating that extra step has many benefits:
- Less technical users won't get confused. Double-clicking an archive usually opens it in a browser rather than extracting it, which might confuse them and cause them to not use my product.
- The application has no risk of breaking if the executable is moved.
- The user doesn't have to delete or move the archive after they extract it.
Bundling files into executables
An executable is just a file format which your operating system understands. It is essentially a header and a whole bunch of sections filled with binary data.
Typically, a linker converts a collection of object files into a single executable. Because executables are containers which can hold various kinds of data, we can package data only our application understands in the same container as the application code.
The operating system is fine with this because it only needs to map the executable into memory and start executing code at a designated entry point. It is then up to the program to decide how to interpret the various executable sections.
Platform differences
There are many different file formats for executables. Usually, an
operating system only supports one executable file format. On Windows,
it's the Win32 Portable Executable
format,
typically with extension .exe
. On Linux, it's usually
ELF.
I am only targeting those two platforms, so I can add code to specifically support those formats when building Cakelisp programs.
On Windows, data is added to executables via Resource Files. I wrote a tutorial on how to do this.
On Linux, data can be added via dumping the data to an object file which defines a couple symbols. This is a great tutorial on how to do that.
Good and bad ways
Like everything in programming, you'll hear different advice on how to bundle data.
The most common alternative method is to convert your data to a C-style array definition. This has many limitations, and in my opinion should be avoided:
- Some compilers (MSVC included) limit the number of elements in an array, which therefore limits the size of the bundled data.
- Your compiler has to do extra processing (tokenization, parsing, etc.) to that data which it should actually just treat as a giant binary blob. Extra unnecessary processing means longer build times.
- An extra stage has to be created and compiled as part of your build system, which adds complexity.
We are going to proceed with the platform dependent but much more robust approach, which is to convert our data to object files without using a C/C++ compiler.
Integrated build system
Whether we are on Windows or Linux, we need to process our data file into some other form in order for the linker to properly understand the data package. This means adding a step to our build to process the data, because we want it to automatically stay up-to-date when linked in the executable.
Cakelisp includes a simple C/C++ build system as well as compile-time code execution. We need to create a new build step to process our binary data into object files. In order to do that, we use a compile-time build hook to execute a function which performs the conversion.
The full code is here.
The end-user interface is simply:
(import "DataBundle.cake") (bundle-file data-start data-end (const char) "../data/MyFont.ttf")
We declare data-start
and data-end
to represent pointers to the
symbols associated with our data.
That bundle-file
invocation is a macro that adds the data file to a
list. It also generates the variables we can use to refer to the data.
Finally, a compile-time function convert-all-bundle-files
calls the
necessary objcopy
(or Resource Compiler on Windows[^1]) to generate
the actual object file for each bundle-file
. It only does this if the
data files are changed or the object files don't already exist in the
cache.
We can then link the generated objects into the executable alongside our code object files. It also adds that object file to the linker command line.
This function is integrated into the Cakelisp build sequence like so:
(add-compile-time-hook-module pre-build convert-all-bundle-files)
Conclusion
This is pretty great: we extended our build system to support bundling arbitrary data files, all without touching Cakelisp's internals itself.
Not only that, we extended the system in the same language we write our application code, and within the same invocation---we didn't need to create some other phase. We were also able to provide the user with an extremely simple interface to bundling files.
[^1]: On Windows, we need to generate a .rc
file with a list of all
the resources that should be compiled into a single object file.
Because Cakelisp allows arbitrary compile-time code execution, we
can easily do this by writing the filenames out to the generated
rc
, then invoking the Resource Compiler on that file. This
platform-specific step can be completely automated!