April 15, 2024
  • We’ve open sourced DotSlash, a software that makes massive executables obtainable in supply management with a negligible influence on repository measurement, thus avoiding I/O-heavy clone operations.
  • With DotSlash, a set of platform-specific executables is changed with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the suitable distant artifact for the present working system and CPU.
  • At Meta, the overwhelming majority of DotSlash information are generated and dedicated to supply management by way of automation, so we’re additionally releasing a complementary GitHub Motion to assemble a comparable setup outdoors of Meta.
  • DotSlash is written in Rust for efficiency and is cross-platform.

At Meta, now we have an unlimited array of first-party and third-party command line instruments that should be obtainable throughout a various vary of developer environments. Reliably getting the suitable model of every software to the suitable place generally is a difficult activity.

For instance, the supply code for a lot of of our first-party instruments lives alongside the initiatives that leverage them inside our large monorepo. For such instruments, the usual follow is to make use of buck2 run to construct and run executables from supply, as essential. This has the benefit that instruments and the initiatives that use them may be up to date atomically in a single commit.

Whereas we use intensive caching and remote execution to offer our builders with quick builds, there’ll all the time be circumstances the place buck2 run goes to be significantly slower than working the prebuilt binary straight. Whereas we leverage a digital filesystem that reduces the drawbacks of checking massive binaries into supply management in comparison with a standard bodily filesystem, there are nonetheless pathological circumstances which might be greatest averted by holding such information out of the repository within the first place. (This follow additionally eliminates a big class of code provenance points.)

Additional, not all the things we use is constructed from supply, nor do all of our instruments reside in supply management. For instance, there may be the case of buck2 itself, which must be pre-built for builders and available on the $PATH for comfort. For core developer instruments like Buck2 and Sapling, we use a Chef recipe to deploy new variations, putting in them in /usr/native/bin (or someplace inside the applicable %PATH% on Home windows) throughout quite a lot of developer environments.

Whereas this method is affordable for commonly-used executables, it’s not a fantastic match for the lengthy tail of instruments. That’s, whereas it is likely to be handy to put in all the things a developer would possibly want in /usr/native/bin by default, this might simply add as much as tens or tons of of gigabytes of disk, little or no of which can find yourself being executed, in follow. In flip, this makes Chef runs dearer and vulnerable to failure.

Introducing DotSlash

DotSlash makes an attempt to unravel lots of the issues described within the earlier part. Whereas we do not claim it is a silver bullet, now we have discovered it to be the suitable answer for a lot of of our inside use circumstances. At Meta, DotSlash is executed tons of of tens of millions of occasions per day to ship a mixture of first-party and third-party instruments to end-user builders in addition to airtight construct environments.

The concept is pretty easy: we change the contents of a set of platform-specific, heavyweight executables with a single light-weight textual content file that may be learn by the dotslash command line software (which have to be put in on the person’s $PATH). We name such a file a DotSlash file. It comprises the data DotSlash must fetch and run the executable it replaces for the host platform. By conference, a DotSlash file maintains the title of the unique file fairly than calling consideration to itself by way of a customized file extension. As a substitute, it aspires to be a clear wrapper for the unique executable. To that finish, a DotSlash file is required to start out with #!/usr/bin/env dotslash (even on Home windows) to assist keep this phantasm.

The next is a hypothetical DotSlash file named node that’s designed to run v18.19.0 of Node.js. Notice that customers throughout x86 Linux, x86 macOS, and ARM macOS can all run the similar DotSlash file, as DotSlash will handle doing the work to pick out the suitable executable for the host on which it’s being run. On this approach, DotSlash simplifies the work of cross-platform releases: 

#!/usr/bin/env dotslash

// The information in a DotSlash file is encoded as a unfastened superset of JSON
// that enables for feedback and trailing commas.

  "title": "node-v18.19.0",
  "platforms": 
    "linux-x86_64": 
      "measurement": 11351600,
      "hash": "blake3",
      "digest": "7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069",
      "format": "zst",
      "suppliers": [
        
          "uri": "https://example.com/bin/node/v18.19.0/node-linux.zst"
        
      ]
    ,
    "macos-x86_64": 
      "measurement": 11360000,
      "hash": "blake3",
      "digest": "94f217b7156fbcdc2285b1b216900b781a71dc19181d52e5477738af7d280f62",
      "format": "zst",
      "suppliers": [
        
          "uri": "https://example.com/bin/node/v18.19.0/node-macos-x86_64.zst"
        
      ]
    ,
    "macos-aarch64": 
      "measurement": 11310000,
      "hash": "sha256",
      "digest": "eb3ede9ff516d6061d859ebdfa8e692255012bc91209946717c2c2e3204b6347",
      "format": "zst",
      "suppliers": [
        
          "uri": "https://example.com/bin/node/v18.19.0/node-macos-aarch64.zst"
        
      ]
    
  


On this instance, the workflow DotSlash runs by way of when executing node seems like: 

See the How DotSlash Works documentation for particulars.

Due to how #! works on Mac and Linux, when a person runs ./node --version, the invocation successfully turns into dotslash ./node --version. DotSlash requires that its first argument is a file that begins with #!/usr/bin/env dotslash, as talked about above. As soon as it verifies the header, it makes use of a lenient JSON parser to learn the remainder of the file. DotSlash finds the entry within the "platforms" part that corresponds to the host it’s working on.

DotSlash makes use of the data on this entry and hashes it to compute a corresponding file path (that doubles as a key) within the person’s native DotSlash cache. DotSlash makes an attempt to exec the corresponding file, changing argv0 with the trail to the DotSlash file and forwarding the remaining command line arguments (--version, on this instance) to the exec invocation.

If the goal executable is within the cache, the person instantly runs Node.js as initially supposed. Within the occasion of a cache miss (indicated by exec failing with ENOENT), DotSlash makes use of the data from the DotSlash file to find out the URL it ought to use to fetch the artifact containing the executable in addition to the scale and digest info it ought to use to confirm the contents. If this succeeds, the verified artifact is atomically mv‘d into the suitable location within the DotSlash cache and the exec invocation is carried out once more. Notice that DotSlash makes use of advisory file locking to keep away from making duplicate requests even when DotSlash information requiring the identical artifact are run concurrently.

Notice that it’s common to have a number of DotSlash information discuss with the identical artifact, such as a .tar.zst file, whereas every DotSlash file maps to a definite entry inside the archive. For instance, suppose node-v18.19.0-darwin-arm64.tar.gz is a compressed tar file that comprises many entries, together with node , npm , and npx. The DotSlash file for node can be as follows:

#!/usr/bin/env dotslash


  "title": "node-v18.19.0",
  "platforms": 
    "macos-aarch64": 
      "measurement": 40660307,
      "hash": "blake3",
      "digest": "6e2ca33951e586e7670016dd9e503d028454bf9249d5ff556347c3d98c347c34",
      // Notice the distinction from the earlier instance the place "format": "zst" has been
      // changed with "format": "tar.gz", which specifies what kind of decompression
      // logic to make use of in addition to the trail inside the decompressed archive to run when
      // this DotSlash file is executed.
      "format": "tar.gz",
      // Assuming node-v18.19.0-darwin-arm64.tar.gz comprises node, npm, and npx within the
      // node-v18.19.0-darwin-arm64/bin/ folder inside the the archive, the next
      // is the one line that has to alter within the DotSlash file that represents
      // these different executables.
      "path": "node-v18.19.0-darwin-arm64/bin/node",
      "suppliers": [
        
          "url": "https://nodejs.org/dist/v18.19.0/node-v18.19.0-darwin-arm64.tar.gz"
        
      ]
    ,
    /* different platforms omitted for brevity */
  


As famous within the feedback, the one change within the DotSlash information for npm and npx can be the "path" entry. As a result of the artifact for all three DotSlash information can be the identical, whichever DotSlash file was run first would fetch the artifact and put it within the cache whereas all subsequent runs of any of the three DotSlash information would leverage the cached entry.

This method is usually used to make sure that a set of complementary executables is launched collectively. Additional, as a result of the archive shall be decompressed in its personal listing, it could additionally include useful resource information (or library information, resembling .dll information that must reside alongside .exe information on Home windows) that shall be unpacked utilizing the listing construction specified by the archive. This additionally makes DotSlash a very good match for distributing executables that aren’t binaries, however timber of script information, which is frequent for Node.js or Python.

Producing DotSlash information

At Meta, most DotSlash information are produced as a part of an automatic construct pipeline. Our steady integration (CI) system helps particular configuration for DotSlash jobs the place a person should specify:

  • A set of builds to run (these can span a number of platforms).
  • The ensuing generated artifacts to publish to an inside blobstore.
  • The DotSlash information in supply management to replace with entries for the brand new artifacts.
  • The situations below which the job ought to be triggered (that is analogous to workflow triggers on GitHub).

The results of such a job is a proposed change to the codebase containing the up to date DotSlash information. At Meta, we name such a change a “diff,” although on GitHub, this is called a pull request. Identical to an abnormal human-authored diff at Meta, placing it up for overview triggers a lot of jobs that embody linters, automated exams, and different instruments that present sign on the proposed change. For a DotSlash diff, if the entire alerts come again clear, the diff is routinely dedicated to the codebase with out additional human intervention.

See the Generating DotSlash Files at Meta documentation for particulars.

The script we use to generate DotSlash information injects metadata in regards to the construct job that makes it easy to hint the provenance of the underlying artifacts. The next is a hypothetical instance of a generated DotSlash file for the CodeCompose LSP constructed from supply at a selected commit in clang-opt mode. Notice the "metadata" entries within the DotSlash file shall be ignored by the dotslash CLI, however we embody them as structured information to allow them to be parsed by different instruments to facilitate programmatic audits:

#!/usr/bin/env dotslash

// @generated SignedSource<<d8621e8ccbd7a595a3018e6a070be9c0>>
// https://yarnpkg.com/package deal?title=signedsource can be utilized to
// generate and confirm the above signature to flag tampering
// in generated code.


  "title": "code-compose-lsp",
  // Added by automation.
  "metadata": 
    "build-info": 
      "job-repo": "fbsource",
      "job-src": "dotslash/code-compose-lsp.star",
      // It's thought-about greatest follow to construct the artifacts for
      // all platforms from the identical commit inside a DotSlash file.
      "commit": 
        "repo": "fbsource",
        "scm": "sapling",
        "hash": "0f9e3d9e189bf393f7f9d0b6879361cd76fcdcd0",
        "date": "2024-01-03 20:07:54 PST",
        "timestamp": 1704341274
      
    
  ,
  "platforms": 
    "linux-x86_64": 
      "measurement": 2740736,
      "hash": "blake3",
      "digest": "fc8a3ade56a97a6e73469ade1575e8f8e33fda99fbf6df429d555e480d6453d0",
      "format": "zst",
      "suppliers": [
        
          "type": "meta-cas",
          "key": "fc8a3ade56a97a6e73469ade1575e8f8e33fda99fbf6df429d555e480d6453d0:2740736"
        
      ]
      // Added by automation.
      "metadata": 
        "build-command": [
          "buck2",
          "build",
          "--config-file",
          "//buildconfig/clang-opt",
          "//codecompose/lsp/cli:code-compose-lsp"
        ]
      
    ,
    // further platforms...
  


With out DotSlash, a developer must run buck2 construct --config-file //buildconfig/clang-opt //codecompose/lsp/cli:code-compose-lsp to construct and run the LSP from supply, which may very well be a sluggish operation relying on the scale of the construct, the state of the construct cache, and many others. With DotSlash, the developer can run the optimized LSP as shortly as they will fetch and decompress it from the required URL, which is probably going a lot quicker than doing a construct.

One other factor you will have seen about this instance is that the "key" is just not an abnormal URL, however an identifier that occurs to be the concatenation of the BLAKE3 hash and the scale of the required artifact. It is because "kind": "meta-cas" signifies that this artifact have to be fetched by way of a customized supplier in DotSlash, which is specialised fetching logic constructed into DotSlash that has its personal identifier scheme. On this case, the artifact can be fetched from Meta’s in-house content-addressable storage (CAS) system, which makes use of the artifact hash+measurement as a key.

Whereas we don’t present the code for the meta-cas supplier within the open supply model of DotSlash, we do embody one customized supplier out-of-the-box past the default http supplier.

Utilizing DotSlash with GitHub releases

Whereas DotSlash is mostly helpful for fetching an executable from an arbitrary URL and working it, now we have discovered the mixture of DotSlash and CI to be significantly highly effective. To that finish, we embody customized tooling to facilitate producing DotSlash information for GitHub releases. To make sure DotSlash can fetch artifacts from personal GitHub repositories in addition to GitHub Enterprise cases, DotSlash features a customized supplier for GitHub releases that features an applicable authentication token when fetching artifacts.

For instance, suppose you might have current workflows for constructing your launch artifacts and publish them by way of gh launch add. For simplicity, let’s assume these are named linux-release, macos-release, and windows-release. To create a single DotSlash file that features the artifacts from all three platforms you’ll introduce a brand new GitHub Action that leverages the workflow_run set off so it fires each time one in every of these launch workflows succeeds. (Notice that GitHub’s documentation states: “You possibly can’t use workflow_run to chain collectively greater than three ranges of workflows,” so examine the depth of your workflow graph in case your workflow is just not firing.)

The .yml file to outline the brand new workflow would seem like this:

title: Generate DotSlash File

on:
  workflow_run:
    # These should match the names of the workflows that publish
    # artifacts to your GitHub Launch.
    workflows: [linux-release, macos-release, windows-release]
    sorts:
      - accomplished

jobs:
  create-dotslash-file:
    title: Producing DotSlash File
    runs-on: ubuntu-latest
    if: $ github.occasion.workflow_run.conclusion == 'success' 
    steps:
      - makes use of: fb/dotslash-publish-release@v1
        env:
          # That is essential as a result of the motion makes use of
          # `gh launch add` to publish the generated DotSlash file(s)
          # as a part of the discharge.
          GITHUB_TOKEN: $ secrets and techniques.GITHUB_TOKEN 
        with:
          # Extra file that lives in your repo that defines
          # how your DotSlash file(s) ought to be generated.
          config: .github/workflows/dotslash-config.json
          # Tag for the discharge to to focus on.
          tag: $ github.occasion.workflow_run.head_branch 

As a result of inputs to GitHub Actions are restricted to string values, fb/dotslash-publish-release takes config, which is a path to a JSON file within the repo that helps a wealthy set of configuration choices for producing the DotSlash information. The opposite required enter is the ID of the discharge, which in GitHub, is defined by a Git tag. When the motion is run, it can examine to see whether or not the entire artifacts specified within the config are current within the launch, and if that’s the case, will generate the suitable DotSlash information and add them to the discharge.

For instance, take into account an open supply challenge like Hermes the place a release contains a lot of platform-specific .tar.gz information, every containing a handful of executables (hermes, hdb, and many others.). To create a separate a person DotSlash file for every executable, the JSON configuration for the motion can be:

{
  "outputs": 

    "hermes": 
      "platforms": 
        "macos-x86_64": 
          "regex": "^hermes-cli-darwin-",
          "path": "hermes"
        ,
        "macos-aarch64": 
          "regex": "^hermes-cli-darwin-",
          "path": "hermes"
        ,
        "linux-x86_64": 
          "regex": "^hermes-cli-linux-",
          "path": "hermes"
        ,
        "windows-x86_64": 
          "regex": "^hermes-cli-windows-",
          "path": "hermes.exe"
        
      
    ,

    "hdb": 
      "platforms": 
        "macos-x86_64": 
          "regex": "^hermes-cli-darwin-",
          "path": "hdb"
        ,
        "macos-aarch64": 
          "regex": "^hermes-cli-darwin-",
          "path": "hdb"
        ,
        "linux-x86_64": 
          "regex": "^hermes-cli-linux-",
          "path": "hdb"
        ,
        "windows-x86_64": 
          "regex": "^hermes-cli-windows-",
          "path": "hdb.exe"
        
      
    ,

    // Extra entries for hvm, hbcdump, and hermesc...
  
  
}'

Every entry in "outputs" corresponds to the title of a DotSlash file that shall be added to the discharge. The "platforms" for every entry defines the "platforms" that ought to be current within the generated DotSlash file. The motion makes use of the "regex" to establish the file within the GitHub launch that ought to be used because the backing artifact for the entry. Assuming the artifact is an “archive” of some kind (.tar.gz, .tar.zst, and many others.), the "path" signifies the trail inside the archive that the DotSlash file ought to run.

On this explicit case, Hermes doesn’t present an ARM-specific binary for macOS, so the "macos-aarch64" entry is identical because the "macos-x86_64"one. Although if that modifications sooner or later, a easy replace to "regex" to differentiate the 2 binaries is all that’s wanted.

Notice that the motion will take duty for computing the digest for every binary. On this instance, the ensuing DotSlash file for hermes can be:

#!/usr/bin/env dotslash


  "title": "hermes",
  "platforms": 
    "linux-x86_64": 
      "measurement": 47099598,
      "hash": "blake3",
      "digest": "8d2c1bcefc2ce6e278167495810c2437e8050780ebb4da567811f1d754ad198c",
      "format": "tar.gz",
      "path": "hermes",
      "suppliers": [
        
          "url": "https://github.com/facebook/hermes/releases/download/v0.12.0/hermes-cli-linux-v0.12.0.tar.gz"
        ,
        
          "type": "github-release",
          "repo": "facebook/hermes",
          "tag": "v0.12.0",
          "name": "hermes-cli-linux-v0.12.0.tar.gz"
        
      ],
    ,
    // further platforms...
  


Notice that there are two entries within the "suppliers" part for the Linux artifact. When DotSlash fetches an artifact, it can strive the suppliers so as till one succeeds. No matter which supplier is used, the downloaded binary shall be verified in opposition to the required "hash", "digest",  and "measurement" values.

On this case, the primary supplier is an abnormal, public URL that may be fetched utilizing curl --location, however the second is an instance of a customized supplier mentioned earlier. The "kind": "github-release" line signifies that the GitHub supplier for DotSlash ought to be used, which shells out to the GitHub CLI (gh, which have to be put in individually from DotSlash) to fetch the artifact as an alternative of curl. As a result of fb/hermes is a public GitHub repository, the primary supplier ought to be adequate right here. Nonetheless, if the repository had been personal and the fetch required authentication, we might anticipate the primary supplier to fail and DotSlash would fallback to the GitHub supplier. Assuming the person had run gh auth login prematurely to configure credentials for the required repo, DotSlash would be capable to fetch the artifact utilizing gh launch obtain.

By publishing DotSlash information as a part of GitHub releases, customers can copy them to their very own repositories to “vendor in” a selected model of your software with minimal impact on their repository measurement, no matter how massive your releases is likely to be.

Attempt DotSlash Right now 

Go to the DotSlash site for extra in-depth documentation and technical particulars. The location contains directions on Installing DotSlash so you can begin enjoying with it firsthand. 

We additionally encourage you to check out the DotSlash source code and supply suggestions by way of GitHub issues. We stay up for listening to from you!