Skip to content

[Bug]: Overlapping Audio Causes Misinterpreted User Input in WebSocket Demo App #1817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Programmer-RD-AI opened this issue Mar 12, 2025 · 0 comments
Open
1 task done
Assignees

Comments

@Programmer-RD-AI
Copy link

Programmer-RD-AI commented Mar 12, 2025

File Name

gemini/multimodal-live-api/websocket-demo-app

What happened?

When using the WebSocket Demo App in the Gemini Multimodal Live API repository, there is an issue where audio inputs overlap during user speech. This overlapping causes the app to mistakenly treat parts of the audio response as if they were user responses, thereby disrupting the natural conversation flow.

Steps to Reproduce:

  1. Launch the WebSocket Demo App from the generative-ai repository.
  2. Start a conversation by speaking into the microphone.
  3. During the conversation, observe that when the user is speaking, overlapping audio (from system responses) is captured concurrently.
  4. Notice that audio segments received during the overlap are interpreted as user input, which causes the conversation to continue erroneously.

Please let me know if further details are needed to help diagnose this issue.

Code of Conduct

  • I agree to follow this project's Code of Conduct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants