feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit introduces gRPC streaming capabilities to BentoML using a new
v1alpha1protocol version.Key changes include:
gRPC Service Definition (
.proto):src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.protodefining aBentoServicewith a server-streamingCallStreamRPC method.Server Implementation:
BentoServiceinsrc/bentoml/grpc/v1alpha1/server.py.v1alpha1server logic into the existing gRPC server infrastructure by modifyingsrc/bentoml/_internal/service/service.pyandsrc/bentoml/_internal/server/grpc_app.pyto handle the new protocol version.Client SDK:
src/bentoml/grpc/v1alpha1/client.pywithBentoMlGrpcClientfor easy interaction with theCallStreammethod. The client supports asynchronous streaming.CLI Enhancements:
bentoml serve-grpccommand supports thev1alpha1protocol via the--protocol-versionflag.bentoml call-grpc-stream(implemented insrc/bentoml_cli/call_grpc_stream.py) to invoke the streaming service from the CLI.Benchmarking:
tests/benchmark/benchmark_streaming.pyto compare the performance of gRPC streaming (v1alpha1) against a conceptual REST streaming equivalent. The script allows for configurable payload sizes and stream lengths.Documentation and Examples:
docs/source/guides/grpc_streaming.mdcovering the definition, implementation, and usage of gRPC streaming.examples/grpc_streaming/demonstrating how to build and use a custom gRPC streaming service with BentoML, including its own.protofile, service implementation, and client example.docs/source/index.rstto include the new documentation.This feature allows you to leverage gRPC for efficient, bi-directional streaming communication with your BentoML services, providing an alternative to traditional REST APIs for scenarios requiring low-latency, high-throughput streaming.
What does this PR address?
This PR introduces comprehensive gRPC streaming capabilities to BentoML, addressing the need for high-performance, low-latency streaming communication in machine learning services. The implementation provides:
v1alpha1protocol versionThe feature is particularly valuable for use cases requiring:
Before submitting:
pre-commit run -ascript has passed (instructions)?