feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

alvidofaisal · 2025-05-31T14:23:43Z

This commit introduces gRPC streaming capabilities to BentoML using a new v1alpha1 protocol version.

Key changes include:

gRPC Service Definition (.proto):
- Added src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto defining a BentoService with a server-streaming CallStream RPC method.
Server Implementation:
- Implemented the BentoService in src/bentoml/grpc/v1alpha1/server.py.
- Integrated the v1alpha1 server logic into the existing gRPC server infrastructure by modifying src/bentoml/_internal/service/service.py and src/bentoml/_internal/server/grpc_app.py to handle the new protocol version.
- Generated necessary gRPC stubs.
Client SDK:
- Created src/bentoml/grpc/v1alpha1/client.py with BentoMlGrpcClient for easy interaction with the CallStream method. The client supports asynchronous streaming.
CLI Enhancements:
- Verified that the existing bentoml serve-grpc command supports the v1alpha1 protocol via the --protocol-version flag.
- Added a new command bentoml call-grpc-stream (implemented in src/bentoml_cli/call_grpc_stream.py) to invoke the streaming service from the CLI.
Benchmarking:
- Introduced tests/benchmark/benchmark_streaming.py to compare the performance of gRPC streaming (v1alpha1) against a conceptual REST streaming equivalent. The script allows for configurable payload sizes and stream lengths.
Documentation and Examples:
- Added a new documentation page docs/source/guides/grpc_streaming.md covering the definition, implementation, and usage of gRPC streaming.
- Created a new example project examples/grpc_streaming/ demonstrating how to build and use a custom gRPC streaming service with BentoML, including its own .proto file, service implementation, and client example.
- Updated docs/source/index.rst to include the new documentation.

This feature allows you to leverage gRPC for efficient, bi-directional streaming communication with your BentoML services, providing an alternative to traditional REST APIs for scenarios requiring low-latency, high-throughput streaming.

What does this PR address?

This PR introduces comprehensive gRPC streaming capabilities to BentoML, addressing the need for high-performance, low-latency streaming communication in machine learning services. The implementation provides:

Server-side streaming support through a new v1alpha1 protocol version
Complete client SDK for easy integration with streaming services
CLI tools for testing and interacting with streaming endpoints
Performance benchmarking tools to compare gRPC streaming vs REST alternatives
Comprehensive documentation and examples to guide users in implementing streaming services

The feature is particularly valuable for use cases requiring:

Real-time inference with streaming inputs/outputs
Low-latency communication for interactive applications
High-throughput scenarios where gRPC's efficiency provides significant performance benefits
Bi-directional communication patterns not easily achievable with REST APIs

Before submitting:

Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request.
Does the code follow BentoML's code style, pre-commit run -a script has passed (instructions)?
Did you read through contribution guidelines and follow development guidelines?
Did your changes require updates to the documentation? Have you updated those accordingly? Here are documentation guidelines and tips on writting docs.
Did you write tests to cover your changes?

This commit introduces gRPC streaming capabilities to BentoML using a new `v1alpha1` protocol version. Key changes include: - **gRPC Service Definition (`.proto`)**: - Added `src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto` defining a `BentoService` with a server-streaming `CallStream` RPC method. - **Server Implementation**: - Implemented the `BentoService` in `src/bentoml/grpc/v1alpha1/server.py`. - Integrated the `v1alpha1` server logic into the existing gRPC server infrastructure by modifying `src/bentoml/_internal/service/service.py` and `src/bentoml/_internal/server/grpc_app.py` to handle the new protocol version. - Generated necessary gRPC stubs. - **Client SDK**: - Created `src/bentoml/grpc/v1alpha1/client.py` with `BentoMlGrpcClient` for easy interaction with the `CallStream` method. The client supports asynchronous streaming. - **CLI Enhancements**: - Verified that the existing `bentoml serve-grpc` command supports the `v1alpha1` protocol via the `--protocol-version` flag. - Added a new command `bentoml call-grpc-stream` (implemented in `src/bentoml_cli/call_grpc_stream.py`) to invoke the streaming service from the CLI. - **Benchmarking**: - Introduced `tests/benchmark/benchmark_streaming.py` to compare the performance of gRPC streaming (`v1alpha1`) against a conceptual REST streaming equivalent. The script allows for configurable payload sizes and stream lengths. - **Documentation and Examples**: - Added a new documentation page `docs/source/guides/grpc_streaming.md` covering the definition, implementation, and usage of gRPC streaming. - Created a new example project `examples/grpc_streaming/` demonstrating how to build and use a custom gRPC streaming service with BentoML, including its own `.proto` file, service implementation, and client example. - Updated `docs/source/index.rst` to include the new documentation. This feature allows you to leverage gRPC for efficient, bi-directional streaming communication with your BentoML services, providing an alternative to traditional REST APIs for scenarios requiring low-latency, high-throughput streaming.

hyperlint-ai · 2025-05-31T14:23:58Z

PR Change Summary

Introduced gRPC v1alpha1 streaming support in BentoML, enhancing communication capabilities for machine learning services.

Added gRPC service definition for server-streaming support.
Implemented server-side logic and integrated with existing infrastructure.
Created a client SDK for easy interaction with streaming services.
Introduced CLI commands for testing and interacting with gRPC streaming.

Modified Files

docs/source/index.rst

Added Files

docs/source/guides/grpc_streaming.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

alvidofaisal added 2 commits May 31, 2025 21:00

fix: Apply code formatting and remove duplicate proto file

1e36caa

alvidofaisal requested a review from a team as a code owner May 31, 2025 14:23

alvidofaisal requested review from parano and removed request for a team May 31, 2025 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

Uh oh!

alvidofaisal commented May 31, 2025

Uh oh!

hyperlint-ai bot commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

Are you sure you want to change the base?

feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

Uh oh!

Conversation

alvidofaisal commented May 31, 2025

What does this PR address?

Before submitting:

Uh oh!

hyperlint-ai bot commented May 31, 2025

PR Change Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant