Skip to content

Reproducible Builds & Supply Chain Integrity

Introduction

A build is bit-for-bit reproducible when the same source code, compiled with the same toolchain under the same conditions, always produces exactly identical binary output -- down to every byte. This is not the common case in software development. Typical builds embed timestamps, absolute filesystem paths, locale-dependent data, random build IDs, and host-specific metadata into their output. Two developers building the same source code on different machines will almost certainly produce different binaries.

Reproducibility is a security property, not merely a quality-of-life convenience. Without it, users who cannot build from source themselves must trust the party that provides the binary. That party could be compromised, malicious, or simply mistake-prone. With reproducibility, any third party can independently verify that a published binary is the authentic output of the claimed source code. Trust is replaced by verifiable evidence.

StageX treats reproducibility as a mandatory precondition for package inclusion. Every package in the distribution builds deterministically. Non-reproducible software is rejected. StageX is the first Linux distribution to require that every artifact be independently reproduced by multiple maintainers on diverse hardware before it is signed and published.

Three Levels of Build Determinism

The software engineering community often uses "reproducible builds" as an umbrella term, but the whitepaper distinguishes three related but distinct properties:

Hermetic. A hermetic build has hash-locked inputs and no network access during compilation. It depends only on explicitly declared sources and toolchains, not on external resources that could change or disappear. StageX enforces hermeticity through RUN --network=none in every Containerfile, meaning no build step can download dependencies from the internet at compile time. Dependencies are declared via COPY --from=stagex/<dep> . / and resolved at build time from the local OCI layout cache.

Deterministic. A deterministic build produces the same output every time from the same inputs. This requires control of factors that commonly introduce non-determinism: file timestamps, locale settings, filesystem ordering, random seeds, and embedded build metadata. StageX achieves determinism through the environment variables in src/global.mk: TZ=UTC, LANG=C.UTF-8, LC_ALL=C, and SOURCE_DATE_EPOCH=1. The SOURCE_DATE_EPOCH variable forces all build tools to use a fixed timestamp (Unix epoch + 1 second) instead of the current clock time. BuildKit flags rewrite-timestamp=true and force-compression=true further ensure that layer timestamps and compression are deterministic.

Reproducible. A reproducible build is portable -- it can be rebuilt on different systems and produce identical artifacts. Determinism alone is not enough if the build depends on a specific CPU model, kernel version, or host toolchain. StageX validates reproducibility across machines from different CPU vendors (AMD and Intel) to confirm that the output is not hardware-dependent. As the maintainer table in the reproduce-builds guide shows, Lance Vick builds on AMD Ryzen Threadripper and EPYC processors, while Danny Grove builds on Intel Core i7 and AMD Ryzen 7 -- all producing identical digests for the same release branch.

How StageX Achieves Reproducibility

StageX's reproducibility is the result of deliberate engineering across the entire build pipeline:

Environment normalization. Every build runs with the following environment variables set by src/global.mk:

Variable Value Purpose
TZ UTC Eliminates timezone-dependent behavior
LANG C.UTF-8 Fixed locale for sorting and formatting
LC_ALL C Overrides all locale categories
SOURCE_DATE_EPOCH 1 Fixed timestamp for all build-time clocks
BUILDKIT_MULTI_PLATFORM 1 Deterministic multi-platform builds
DOCKER_BUILDKIT 1 Enables BuildKit frontend

BuildKit flags. The --output flag in every build target includes rewrite-timestamp=true (normalizes layer timestamps to SOURCE_DATE_EPOCH) and force-compression=true (ensures identical compressed output regardless of host zlib version). The annotation org.opencontainers.image.created=1970-01-01T00:00:01Z fixes the image creation timestamp. From src/targets.py, the output specification includes all of these parameters for every package.

Network isolation. The --network=none flag on every RUN instruction prevents build steps from accessing the network during compilation. This eliminates the possibility of network-dependent non-determinism (e.g., a build that embeds the current public IP or fetches a timestamp from a remote server) and closes the dependency substitution attack vector.

Pinned source digests. Every source tarball referenced in package.toml includes a SHA-256 hash. The src/fetch.py script verifies this hash before making the source available to the build. git ls-files pins the Containerfile and package.toml to specific committed versions, recorded as build dependencies in the auto-generated out/targets.mk.

Locked toolchain digests. Dependencies are referenced by their OCI layout digest (oci-layout://./out/<dep>), not by mutable tags. This means a build always uses exactly the same toolchain image that was used in the previous build, even if a newer version has been published.

Why Reproducibility Enables Accountability

If only one party can produce a binary, users must trust that party. The binary could differ from the source -- embedding backdoors, omitting security fixes, or adding telemetry -- and there would be no way for an external observer to detect the discrepancy.

Reproducibility transforms this dynamic. If anyone can rebuild the same source and verify that the published binary matches their own build, trust becomes verifiable. The user does not need to trust the distribution's build infrastructure. They can build it themselves, compare the digest, and confirm identity independently.

StageX extends this principle through multi-party verification. Two or more maintainers independently build every package on diverse hardware (different CPU vendors, different machines, different geographical locations) and confirm that the resulting digests match. A single compromised maintainer cannot publish a different binary without the reproduction step catching the discrepancy. As the whitepaper states, "even if one maintainer or build server is compromised, the release will not proceed without matching hashes and a second signature."

The Verification Chain

The end-to-end verification workflow for a StageX user proceeds as follows:

  1. Pull the published image from an OCI registry by digest.
  2. Check the published digest against the digests/<stage>.txt file committed to the StageX repository.
  3. Clone the StageX repository and reproduce the build locally using the same release branch.
  4. Compare the local digest against the published digest. A match confirms the published image was built from the claimed source.
  5. Verify the multi-signature attestations on the digest, confirming that at least two independent maintainers signed identical output.

Each step is cryptographically verifiable. The user needs only the StageX source repository (which contains maintainer key fingerprints in the MAINTAINERS file), an OCI-compatible build tool, and the published maintainer GPG keyring from sigs.stagex.tools. No external trust anchors are required beyond the 181-byte hex0-seed root of trust.

Industry Context

The Reproducible Builds project has been tracking the reproducibility of major Linux distributions for several years. As of the whitepaper's data:

  • Debian: approximately 94% reproducible (Bullseye release)
  • Fedora: approximately 90% reproducible
  • Arch Linux: approximately 89% reproducible
  • openSUSE Tumbleweed: greater than 95% reproducible

These figures represent significant engineering achievement. However, they measure reproducibility against builds that are not fully bootstrapped from source. A package is counted as reproducible if its binary can be rebuilt identically from the declared build inputs -- but those inputs typically include pre-compiled toolchain binaries inherited from previous distributions. The whitepaper notes this framing "creates ambiguity, making it difficult to have confidence in the provided completion percentages."

StageX differs in two critical respects:

  • 100% reproducibility is required, not aspirational. Non-reproducible packages are rejected outright. No package is published unless it builds deterministically and has been independently reproduced by multiple maintainers.
  • 100% full-source bootstrapping is required. Every build traces its provenance to the 181-byte hex0-seed. The toolchain itself, including the compiler (built from LLVM source), is rebuilt from source. This means StageX's reproducibility guarantee covers the entire software stack, not just the final application layer.

What Happens When Builds Aren't Reproducible

The SolarWinds compromise of 2020 is the most prominent illustration of the risks of non-reproducible builds. An attacker injected malicious code into the Orion IT monitoring platform's build environment. The resulting binary included a backdoor that went undetected for months, compromising thousands of organizations including multiple US federal agencies.

Because SolarWinds builds were not reproducible, there was no way for external auditors to detect the tampering by comparing a clean build against the distributed binary. Detection required discovering the backdoor in the running software -- a process that took months and caused widespread damage. SolarWinds subsequently announced a "Next-Generation Build System" incorporating reproducible builds and parallel verification, but the system remains proprietary and closed-source. Users cannot independently verify its claims.

Reproducible builds are not a theoretical nicety. They are the only known mechanism that allows users to verify that a distributed binary matches the source code it claims to represent. Without them, supply chain security reduces to trust in the party that built and signed the binary -- and that party may be the one that was compromised.

See Also