Skip to content

Conversation

jasonwbarnett
Copy link

Description

Problem: Buildkite agents poll for jobs every 10-20 seconds (server interval + jitter), causing significant delays for performance-sensitive workloads like dynamic pipelines where steps generate other steps. This can add 10-20 seconds of latency to each step, compounding across pipeline execution.

Solution: Add --ping-interval CLI flag and BUILDKITE_AGENT_PING_INTERVAL environment variable to override the server-specified ping interval, enabling faster job pickup (e.g., 5-10 seconds instead of 10-20 seconds).

Alternatives considered:

  • Server-side configuration (would require Buildkite platform changes)
  • WebSocket-based job delivery (major architectural change)
  • Chose client-side override as it's backward compatible and gives users immediate control

Context

This addresses performance concerns with dynamic pipelines where job pickup latency significantly impacts overall pipeline execution time. The feature allows users to optimize for their specific use cases while maintaining server load protection through built-in safeguards.

Changes

  • Core functionality: Add ping interval override with 2-second minimum safeguard
  • Configuration: New PingInterval field in AgentStartConfig and AgentConfiguration
  • Validation: Automatic clamping of values below 2 seconds with warning logs
  • Testing: Comprehensive unit tests for validation logic and CLI configuration
  • Documentation: Updated CLI help, README, technical docs, and CHANGELOG

CLI Help Output:

--ping-interval value    Override the server-specified ping interval in seconds (integer values only). 
                         The default of 0 uses the server-provided interval. Minimum value is 2 seconds 
                         (default: 0) [$BUILDKITE_AGENT_PING_INTERVAL]

Usage Examples:

# Faster job pickup (5-10 seconds instead of 10-20 seconds)
buildkite-agent start --ping-interval=5

# Environment variable
export BUILDKITE_AGENT_PING_INTERVAL=3
buildkite-agent start

# Invalid values are handled gracefully
buildkite-agent start --ping-interval=1    # Clamped to 2s with warning
buildkite-agent start --ping-interval=2.5  # Error: integer values only

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go fmt ./...)

Test Coverage:

  • TestAgentWorker_PingIntervalValidation: Validation logic, clamping, warning messages
  • TestAgentStartConfig_PingInterval: CLI configuration mapping
  • Manual testing: CLI flag parsing, environment variables, error handling

Disclosures / Credits

Claude Code implemented the majority of this feature including:

  • Core ping interval validation and override logic
  • Comprehensive unit test suite with edge cases
  • Documentation updates across README, technical docs, and CHANGELOG
  • Safeguard implementation (2-second minimum with warnings)
  • Code refactoring to make validation logic testable

I provided the initial requirement, reviewed the implementation approach, and requested the 2-second minimum safeguard to protect server infrastructure. The solution design and technical implementation were done by Claude Code following Buildkite's coding conventions and patterns.

@petetomasik petetomasik requested a review from a team September 9, 2025 16:38
@jasonwbarnett jasonwbarnett force-pushed the feat/set-polling-interval branch from 8e6f15c to db2b780 Compare September 13, 2025 12:19
Add --ping-interval CLI flag and BUILDKITE_AGENT_PING_INTERVAL environment
variable to override the server-specified ping interval. This enables faster
job pickup for performance-sensitive workloads like dynamic pipelines.

By default, agents poll every 10-20 seconds (server interval + jitter). With
this feature, users can reduce latency to 5-10 seconds or other custom intervals.

Includes safeguards and comprehensive testing:
- Minimum 2-second interval to prevent server overload (values below 2s are clamped with warning)
- Only integer values supported (floats like 2.5 are rejected with clear error)
- Comprehensive unit tests for validation logic and CLI configuration
- Clear documentation of constraints and behavior

BREAKING CHANGE: None. Feature is backward compatible - when ping-interval is
0 or unspecified, the agent uses the server-provided interval as before.

Changes:
- Add PingInterval field to AgentStartConfig and AgentConfiguration
- Add --ping-interval CLI flag with BUILDKITE_AGENT_PING_INTERVAL env var
- Extract determinePingInterval() method for testable validation logic
- Add comprehensive unit tests (TestAgentWorker_PingIntervalValidation, TestAgentStartConfig_PingInterval)
- Add minimum 2-second safeguard with warning for lower values
- Change ping interval logging from debug to info level for visibility
- Update documentation to clarify integer-only constraint and minimum value
- Add comprehensive documentation in CHANGELOG, README, and docs/

Co-Authored-By: Claude <[email protected]>
@jasonwbarnett jasonwbarnett force-pushed the feat/set-polling-interval branch from db2b780 to 0bd4d13 Compare October 8, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant