AI Operator - Real-time Voice Conversation System

This project implements a low-latency, real-time voice conversation system with a web client. It combines specialized services to create a responsive AI assistant that can understand speech, respond intelligently, and be interrupted naturally during conversation.

See it in action: https://www.youtube.com/watch?v=iPqDASo2gsQ

Key Features

Real-time voice conversations with GPT-4o
Low-latency responses through WebSocket streaming
Natural interruption handling - speak while AI is talking to interrupt it
Multi-service architecture optimizing each part of the conversation pipeline:
- Deepgram for speech-to-text
- OpenAI GPT-4o for language processing
- Cartesia TTS for high-quality voice output

Advantages Over Other Systems

Speed: Optimized for reduced latency compared to single-provider solutions
Voice Quality: Uses Cartesia's "British Reading Lady" voice for natural speech
Interruption: Supports natural conversation flow with immediate response to interruptions
Customizable: Each component can be swapped with alternatives

Getting Started

Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your API credentials

Required API Keys

Add the following to your .env file:

OPENAI_API_KEY - For GPT-4o language model
DEEPGRAM_API_KEY - For speech recognition
CARTESIA_API_KEY - For text-to-speech

Run the Bot Server

python bot.py

Run the Web Client

python -m http.server

Then, visit http://localhost:8000 in your browser to start a conversation.

Run the Tests

Python Backend Tests

# Run all Python tests
pytest

# Run with coverage report
pytest --cov=. --cov-report=html

# Run only unit tests
pytest tests/unit/

# Run only integration tests
pytest tests/integration/

JavaScript Frontend Tests

# Run all JavaScript tests
npm test

# Run with coverage
npm run test:coverage

# Run tests in watch mode (for development)
npm run test:watch

Run All Tests

For convenience, you can run all tests (both backend and frontend) with:

./run_tests.sh

Technical Architecture

The system uses a pipeline architecture:

Web client captures audio and streams to server via WebSockets
Speech is converted to text using Deepgram
Text is processed by GPT-4o
Responses are converted to speech using Cartesia TTS
Audio is streamed back to client for playback

Voice detection monitors audio levels and triggers interruption handling when the user starts speaking during AI responses.

Testing

The project has comprehensive test coverage for both backend and frontend components.

Backend Testing

The Python backend uses pytest for testing. Tests are organized into:

Unit Tests: Test individual components in isolation
Integration Tests: Test interactions between components

The backend test suite includes:

Bot initialization and configuration
Pipeline setup and component connections
Text processing and transformation
Session timeout handling
Event handling

To write new Python tests, add them to the appropriate directory under tests/.

Frontend Testing

The JavaScript frontend uses Jest for testing. Tests are organized by component:

Unit Tests: Test individual JS modules
UI Tests: Test DOM interactions and UI updates

The frontend test suite includes:

Configuration validation
UI state management
Audio processing
WebSocket communication
Event handling

To write new JavaScript tests, add them to the js/__tests__/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
__mocks__		__mocks__
css		css
js		js
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TEST_REPORT.md		TEST_REPORT.md
babel.config.js		babel.config.js
bot.py		bot.py
direct_frames.js		direct_frames.js
favicon.ico		favicon.ico
frames.proto		frames.proto
index.html		index.html
jest.config.js		jest.config.js
jest.setup.js		jest.setup.js
package.json		package.json
processors.py		processors.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Operator - Real-time Voice Conversation System

Key Features

Advantages Over Other Systems

Getting Started

Setup

Required API Keys

Run the Bot Server

Run the Web Client

Run the Tests

Python Backend Tests

JavaScript Frontend Tests

Run All Tests

Technical Architecture

Testing

Backend Testing

Frontend Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Kode-Rex/ai-operator

Folders and files

Latest commit

History

Repository files navigation

AI Operator - Real-time Voice Conversation System

Key Features

Advantages Over Other Systems

Getting Started

Setup

Required API Keys

Run the Bot Server

Run the Web Client

Run the Tests

Python Backend Tests

JavaScript Frontend Tests

Run All Tests

Technical Architecture

Testing

Backend Testing

Frontend Testing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages