-
-
Notifications
You must be signed in to change notification settings - Fork 390
feat(voice): add Gemini 2.5 Native TTS Provider #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- add GeminiVoiceProvider with 30 high-quality voices - Support single-speaker and multi-speaker TTS (up to 2 speakers) - Natural language style control for speech characteristics - 24 language support with automatic detection - Professional audio quality (24kHz, 16-bit, mono PCM) - Complete example with usage demonstrations - Updated documentation with Gemini provider info Features: ✅ 30 Gemini voices with style metadata (Zephyr, Puck, Kore, etc.) ✅ Multi-speaker conversations with voice assignment ✅ Style prompts (e.g., 'Say with great excitement') ✅ TypeScript support with comprehensive types ✅ Error handling and event emission ✅ Consistent with existing VoltAgent voice architecture Breaking Changes: None Dependencies: Added @google/genai ^1.0.0
|
- Add missing lockfile entries for @google/genai ^1.3.0 - Resolve CI build failures due to frozen lockfile mismatch - Ensure all dependencies are properly locked
- Remove unused GEMINI_TTS_MODELS import - Remove invalid baseURL option from GoogleGenAI constructor - Fix unused parameter warning with underscore prefix - Remove unused baseURL property and type definition - Ensure clean TypeScript compilation
- Remove ToolResult from xsai imports as it's not exported by the package - Update convertTools return type to use any[] instead of ToolResult[] - Fixes TypeScript build error in @voltagent/xsai package
| }; | ||
|
|
||
| convertTools = async (tools: BaseTool[]): Promise<ToolResult[] | undefined> => { | ||
| convertTools = async (tools: BaseTool[]): Promise<any[] | undefined> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, Thank you for this PR. Quick question: Is there any particular reason we updated this line?
🎤 Add Gemini 2.5 Native Voice TTS Provider
This PR implements Google's Gemini 2.5 native text-to-speech capabilities as an official voice provider in VoltAgent.
✨ Features
📁 Files Added/Modified
Core Implementation
packages/voice/src/providers/gemini/index.ts- Main GeminiVoiceProvider classpackages/voice/src/providers/gemini/types.ts- TypeScript definitions and voice metadatapackages/voice/src/index.ts- Export new providerpackages/voice/package.json- Added @google/genai dependencyDocumentation & Examples
examples/with-voice-gemini/- Complete example with usage demonstrationswebsite/docs/agents/voice.md- Updated documentation🚀 Usage Examples
Single-Speaker TTS
Multi-Speaker Conversations
Style-Controlled Speech
🎯 Benefits
🔧 Technical Details
@google/genai ^1.3.0✅ Testing
📚 Documentation
🎉 Ready for Review
This implementation adds Gemini 2.5 native TTS as a first-class voice provider in VoltAgent, giving users access to Google's cutting-edge voice synthesis technology with the same consistent API they're used to.
The provider is production-ready and follows all existing VoltAgent patterns and conventions.
Implementation Highlights:
Example Use Cases:
This contribution expands VoltAgent's voice capabilities significantly while maintaining the high standards of code quality and user experience that VoltAgent is known for.