Generating Software Bills of Materials (SBOMs) with SPDX at Microsoft

Adrian

The U.S. Presidential Executive Order on Improving the Nation’s Cybersecurity, released on May 12, 2021, came in response to the SolarWinds supply chain attack, and calls for sweeping improvements to modernize Federal Government cybersecurity and enhance software supply chain security. One of the items that they are requiring is a Software Bill of Materials (SBOM).

SBOMs aren’t new to Microsoft. In fact, we have been generating our own proprietary build manifests for years. Since September 2019, Microsoft has also led and co-chaired the Consortium for Information & Software Quality (CISQ) Tool-to-Tool (3T) SBOM cross-industry working group to define a new standard SBOM schema. What is new is that Microsoft has chosen, along with the others in 3T, to merge the 3T effort with the Linux Foundation’s work and use Software Package Data Exchange (SPDX) for all SBOMs we generate, and we have embarked on the mission to do this for all software we produce. This means we’ve had to convert our existing manifest generation tools to output JSON files in the ISO/IEC 5962:2021 standard SPDX 2.2.1 format, and we need to roll out this capability across our core engineering systems.

Why have an SBOM?

An SBOM is useful to producers and consumers of software, as it provides software transparency, software integrity, and software identity benefits. Here is a bit about each:

  • Software transparency: SBOMs provide a list of ingredients used in the creation of a piece of software, such as open source software, components, and potentially even build tools. This enables producers and consumers to better inventory and evaluate license and vulnerability risk.
  • Software integrity: While code signing is still the industry standard for trusting software and its integrity, SBOMs contain package and file checksums to enable consumers to validate the hashes, which can be useful in scenarios when signatures aren’t present.
  • Software identity: When vulnerabilities (CVEs) are created, they are assigned to a Common Platform Enumeration (CPE) identifier, which can have issues attributing a CPE to a specific piece of software. Software IDs within SBOMs provide a much more accurate way to identify software.

Designing executive order-compliant SBOMs

The report outlined what fields must be included in our SBOMs, so we mapped the NTIA minimum fields to SPDX 2.2.1:

NTIA field NTIA description SPDX 2.2.1 field
Supplier Name The name of an entity that creates, defines, and identifies components Package Supplier
Component Name Designation assigned to a unit of software defined by the original supplier Package Name
Version of the Component Identifier used by the supplier to specify a change in software from a previously identified version Package Version
Other Unique Identifiers Other identifiers that are used to identify a component, or serve as a look-up key for relevant databases Package SPDX Identifier
Dependency Relationship Characterizing the relationship that an upstream component X is included in software Y Relationship
Author of SBOM Data The name of the entity that creates the SBOM data for this component Creator
Timestamp Record of the date and time of the SBOM data assembly Created

 

This helped define the first phase of our implementation of the SPDX spec. We knew we had to include all mandatory fields from the SPDX 2.2 specification plus include specific optional fields to establish a baseline for our first implementation. While supplier name, package version, package checksum, and relationship fields are optional in SPDX, we are making them mandatory for Microsoft products.

Generating SBOMs at scale across Microsoft

Microsoft cares deeply about developer productivity and wants to minimize impact to build times, especially considering we have an average of ~500,000 builds occurring on any given day. Taking this into account, here’s how we’re planning to roll this out to the thousands of Microsoft products we build:

  1. Design tooling to automate SBOM generation at build time.
  2. Produce SBOMs for all official builds.
  3. Pilot this capability with a small customer base to incorporate feedback.
  4. Leverage existing CI/CD capabilities to intelligently inject our SBOM generation tool into build pipelines, aspiring to have SBOM generation “on by default.”
  5. Expand this capability out to our various engineering systems in a phased rollout.
  6. Provide a cross-platform executable binary for non-standard build environments.

Capabilities of our SBOM generator

Our SPDX SBOM generator tool is cross-plat, supporting Windows, Linux, and Mac environments (and will be open sourced soon). It also provides open source software (OSS) detection for inclusion in the SBOM across NPM, NuGet, PyPI, CocoaPods, Maven, Golang, Rust Crates, RubyGems, containers (and their Linux packages), Gradle, Ivy, GitHub public repositories, and more. It generates two checksums for each package and file – SHA256 (strong, collision resistant hash) and SHA1 (required per SPDX specification). Our tool also automates digitally signing each SBOM to protect its integrity and then creates a new folder at the root of the build drop called _manifest; this is where the SPDX JSON file is stored. An example is shown below:

Image addiglio fig2

Adding build provenance information to the SBOM

SBOMs primarily provide transparency about the contents of the build output. At Microsoft, we wanted to go a step further and provide provenance information about the build system where the SBOM was generated and make the SBOM itself tamper-evident. To achieve this, we integrated a signing service with our SBOM generation tool, which performs the following workflow:

  1. At the start of a build, the build service creates a session token that includes claims describing the build (e.g., source code commit ID, build ID, the repository URL) which uniquely identify a build run.
  2. The build service sends this token to the build agent/runner to use during the build.
  3. The build runs as normal, creating outputs.
  4. An SBOM is created that describes the outputs.
  5. The build agent calls into the signing service, providing both the session token and a hash of the SBOM.
  6. The build service creates a catalog file with a signature that attests that the hash of the SBOM came from the build described by the claims in the sessions token.

Validating our SBOMs at release

One key scenario that we’ve added is the ability to validate the hashes of all files listed in the SBOM against the hashes of the build drop itself and validate that the digital signature on the SBOM is the trusted signature from Microsoft. If our SBOM validation tool detects a hash mismatch or incorrect signature, our SBOM validation tool will block the deployment. This ensures that nothing was tampered with between build and release. Going forward, we would also like to add checks that, for example, the signature shows that the build came from the expected build definition.

Image adiglio figure 1

We anticipate more exciting announcements to follow on this topic in the future, so stay tuned!

3 comments

Leave a comment

  • Aaron Smith

    Thanks for this write-up. I’m sure other companies will use this as a model. Do you treat open source license compliance as a separate effort from SBOM, or do you integrate the SBOM results with your legal license compliance efforts?

  • Carsten Krüger

    How do you want to protect the signing secret for the SBOM generator?
    A build pipeline is turing complete. As soon as an attacker can modify any pipeline he can generate arbitrary SBOMs.

    The signing key must be at least product specific and not organisation specific.

    Example:
    Team A create software A with SBOM A.

    An attacker who can control pipeline B takes software A, removes SBOM A, modifies software A so it’s getting A’ and generates SBOM A’

    Anyone who check A’ with SBOM A’ will find no problem.