Description
Describe the bug
The streaming implementation in lite_llm.py
contains an async/await usage error that causes a "fake streaming" phenomenon. The current implementation waits for all content to be fully returned before chunking it into Events, rather than providing true real-time streaming.
To Reproduce
Steps to reproduce the behavior:
- Install
google-adk==1.2.1
- Use LiteLLM model with streaming calls
- Observe the response behavior
- Notice that streaming is not real-time but rather batch processing disguised as streaming
Expected behavior
Should implement true real-time streaming where each chunk is returned immediately as it's generated, rather than waiting for all content to complete before batch processing.
Root Cause Analysis
The issue occurs in lite_llm.py#L682-L683:
Current incorrect implementation:
for part in self.llm_client.completion(**completion_args):
for chunk, finish_reason in model_response_to_chunk(part):
Correct implementation should be:
async for part in await self.llm_client.acompletion(**completion_args):
for chunk, finish_reason in model_response_to_chunk(part):
Technical Explanation:
- Synchronous Issue: The current
completion()
method is synchronous and blocks the main thread in streaming mode until all data is received before starting iteration - Fake Streaming Phenomenon: Due to synchronous blocking, all chunks get buffered and processed at once, creating an illusion of streaming when it's actually batch processing
- Proper Async Implementation: Using
acompletion()
withasync for
enables true non-blocking streaming
Python Async Principles:
async for
allows execution to pause at each iteration, giving other coroutines a chance to executeawait
ensures proper waiting for async functions without blocking the event loop- True streaming requires non-blocking iterators, which can only be achieved through
async for
- Synchronous
for
loops wait for the entire iterator to complete before continuing, losing the real-time nature of streaming
Desktop (please complete the following information):
- OS: macOS
- Python version: 3.13
- ADK version: 1.2.1
Additional context
Reference to LiteLLM's official documentation for correct async streaming implementation:
https://docs.litellm.ai/docs/completion/stream#async-streaming
Video demonstration: https://youtu.be/nF1G4leSbSI
This issue affects all applications that rely on true streaming responses, particularly real-time interactive chat applications.
Proposed Fix
The fix requires changing the synchronous iteration to asynchronous iteration and ensuring the method calling this code is properly marked as async
and awaited appropriately throughout the call stack.