Building Linux in a Jail
The problem of compiling and packaging Linux distributions in a reproducible and automated manner is harder that it seems. Several different organizations have developed tools to handle such builds. This paper describes an alternative meta-RPM build tool which significantly improves on existing practice. Variations between environments on developer, test and production machines can lead to builds succeeding on some machines but not others, or succeeding but producing subtly different output. A standard approach has been to write tools that jail the entire build in a common virtualized environment image, usually via chroot, thus producing nearly identical build output on machines which have in common only a kernel ABI. However, these tools have to date only properly addressed the issue of reproducibility, and have severe deficiencies in the areas of automation and usability. We describe a tool that is used by both nightly production builds and developers; time to set up build environment on new machine is about five minutes; uses rsync and unionfs copy on write to achieve pristine chroot target before every package build orders of magnitude faster than tools from SuSE; creates installable output image automatically at end of build, including for cross-built targets by using an install script interception; automatically generates report that proves what libraries each binary linked against for legal verification; natively supports concept of inheriting new mini-distribution from base distribution; provides ability to drop to shell in pre-patched chroot build environment to fix spec file and patch problems; integrated with source code version control, allowing developers to work from branches and for builds automatically reproduced from tags; automatically chooses high-bandwidth data sources and uses linear backoff retry to recover from network errors; aggressively uses dynamic programming and caching to support incremental production builds of large distributions. Finally, we present current work to extend the tool to implement a lightweight distributed model in which every developer and production machine transparently participates in a self-managing swarm. Once the swarm has built a certain package on two distinct machines and verified that the output is identical, future build requests for that package for the same architecture are satisfied by the swarm cache: each package is built only twice per architecture across the entire developer and release teams over life of distribution. In addition to obvious advantages of reduced build time, this model provides better quality guarantees than the common model of using only blessed production build machines.
Amos Waterland is a programmer working for IBM.