Skip to content

Conversation

jasonwbarnett
Copy link

@jasonwbarnett jasonwbarnett commented Jul 27, 2025

Description

This PR adds support for Git partial clones and sparse checkouts to the Buildkite Agent, enabling users to clone only specific directories from large repositories (monorepos).

Problem being solved:

  • Large monorepos can take significant time to clone, consuming bandwidth and disk space
  • Teams often only need specific directories for their builds
  • Full clones of repositories with extensive history are inefficient when only recent commits are needed

Solution:

  • Implemented Git partial clone support using --filter and --depth flags
  • Added sparse checkout functionality to check out only specified directories
  • Configuration available via environment variables or CLI flags
  • Supports multiple directory paths (comma-separated)

Alternatives considered:

  • Using git submodules (requires repository restructuring)
  • External scripts in hooks (less integrated, harder to maintain)
  • Post-clone cleanup (still requires full initial clone)

Context

This addresses the common request from users with large monorepos who want to optimize their build times. Similar to features available in other CI/CD platforms.

Changes

New Environment Variables:

  • BUILDKITE_GIT_SPARSE_CHECKOUT - Enable sparse checkout (boolean)
  • BUILDKITE_GIT_SPARSE_CHECKOUT_PATHS - Comma-separated paths to include
  • BUILDKITE_GIT_CLONE_DEPTH - Clone depth for shallow clones
  • BUILDKITE_GIT_CLONE_FILTER - Filter specification (e.g., "tree:0")

New CLI Flags:
--git-sparse-checkout Enable sparse checkout for partial clones
--git-sparse-checkout-paths value Paths to include in sparse checkout (comma-separated)
--git-clone-depth value Clone depth for shallow clones (e.g., "200")
--git-clone-filter value Filter specification for partial clones (e.g., "tree:0")

Code Changes:

  • Added gitSparseCheckoutInit() and gitSparseCheckoutSet() functions
  • Modified clone process to support --no-checkout when sparse checkout is enabled
  • Integrated partial clone flags into existing git operations
  • Added support for both new clones and existing repositories

Example Usage:

env:
  BUILDKITE_GIT_CLONE_DEPTH: "200"
  BUILDKITE_GIT_CLONE_FILTER: "tree:0"
  BUILDKITE_GIT_SPARSE_CHECKOUT: "true"
  BUILDKITE_GIT_SPARSE_CHECKOUT_PATHS: ".buildkite,services/api"

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go fmt ./...)

Additional Testing:

  • Added comprehensive unit tests for new sparse checkout functions
  • Added tests for flag building logic
  • Verified existing git tests still pass
  • Manually tested with local agent against a test repository

@jasonwbarnett jasonwbarnett force-pushed the jwb/add-sparse-checkouts branch from a58094b to 25b9bc2 Compare July 27, 2025 12:50
@jasonwbarnett jasonwbarnett force-pushed the jwb/add-sparse-checkouts branch from 25b9bc2 to 298092e Compare July 28, 2025 01:20
@jasonwbarnett
Copy link
Author

jasonwbarnett commented Jul 28, 2025

For the record, I'm aware of https://github.com/buildkite-plugins/sparse-checkout-buildkite-plugin. One of the downsides is it forces you to specify the plugin for each step which in a larger organization is messy and prone to error. You also miss out of the advantages of using git mirrors on your agents when you use a buildkite plugin for checkouts (we already embed a mirror of our monorepo in our AMIs to avoid reaching out to the git server as much as possible). The checkout functionality within the agent is quite robust and it would be nice to leverage all of it's functionality without redoing it all in a plugin.

If this feature were to get merged it has the advantages of being able to tie steps that simply upload a pipeline via buildkite-agent pipeline upload to a particular queue which has an agent with all of the pre-defined values, i.e.

buildkite-agent start \
    --tags "queue=bk-sparse-checkout" \
    --git-sparse-checkout \
    --git-sparse-checkout-paths ".buildkite,src/python/buildkite_pipelines" \
    --git-clone-depth 1 \
    --git-clone-filter "tree:0"

Allowing you to have 100's of pipeline steps like:

steps:
  - label: ":pipeline: Pipeline upload"
    agents:
      queue: bk-sparse-checkout
    command: buildkite-agent pipeline upload .buildkite/pipelines/<...>/pipeline.yml

If in the future something changes, i.e. you need to add a new sparse checkout path, just update your buildkite-agent start command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant