LiteLLM streaming not truly asynchronous - returns batch processed chunks

**Describe the bug**
The streaming implementation in `lite_llm.py` contains an async/await usage error that causes a "fake streaming" phenomenon. The current implementation waits for all content to be fully returned before chunking it into Events, rather than providing true real-time streaming.

**To Reproduce**
Steps to reproduce the behavior:
1. Install `google-adk==1.2.1`
2. Use LiteLLM model with streaming calls
3. Observe the response behavior
4. Notice that streaming is not real-time but rather batch processing disguised as streaming

**Expected behavior**
Should implement true real-time streaming where each chunk is returned immediately as it's generated, rather than waiting for all content to complete before batch processing.

**Root Cause Analysis**
The issue occurs in [lite_llm.py#L682-L683](https://github.com/google/adk-python/blob/c9e265551adcd27f97d9b16e95e330b32a97a8eb/src/google/adk/models/lite_llm.py#L682-L683):

**Current incorrect implementation:**
```python
for part in self.llm_client.completion(**completion_args):
    for chunk, finish_reason in model_response_to_chunk(part):
```

**Correct implementation should be:**
```python
async for part in await self.llm_client.acompletion(**completion_args):
    for chunk, finish_reason in model_response_to_chunk(part):
```

**Technical Explanation:**

1. **Synchronous Issue:** The current `completion()` method is synchronous and blocks the main thread in streaming mode until all data is received before starting iteration
2. **Fake Streaming Phenomenon:** Due to synchronous blocking, all chunks get buffered and processed at once, creating an illusion of streaming when it's actually batch processing
3. **Proper Async Implementation:** Using `acompletion()` with `async for` enables true non-blocking streaming

**Python Async Principles:**
- `async for` allows execution to pause at each iteration, giving other coroutines a chance to execute
- `await` ensures proper waiting for async functions without blocking the event loop
- True streaming requires non-blocking iterators, which can only be achieved through `async for`
- Synchronous `for` loops wait for the entire iterator to complete before continuing, losing the real-time nature of streaming

**Desktop (please complete the following information):**
- OS: macOS
- Python version: 3.13
- ADK version: 1.2.1

Additional context
Reference to LiteLLM's official documentation for correct async streaming implementation:
https://docs.litellm.ai/docs/completion/stream#async-streaming

Video demonstration: https://youtu.be/nF1G4leSbSI

This issue affects all applications that rely on true streaming responses, particularly real-time interactive chat applications.

**Proposed Fix**
The fix requires changing the synchronous iteration to asynchronous iteration and ensuring the method calling this code is properly marked as `async` and awaited appropriately throughout the call stack.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LiteLLM streaming not truly asynchronous - returns batch processed chunks #1306

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LiteLLM streaming not truly asynchronous - returns batch processed chunks #1306

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions