Disabling dynamic batching and enabling TRTLLM's continuous batching #570

Unanswered

dhruvmullick asked this question in Q&A

dhruvmullick
Aug 14, 2024

In the file: https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt

To enable TRTLLM's continuous batching and disabling Triton's batching, do I only need to:

Remove the dynamic_batching block https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt
For gpt_model_type, use inflight_fused_batching
Use any value for max_batch_size https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt#L29

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment