Skip to content

build-ai-applications/voice-cloning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 Voice Cloning Model Evaluation

✨ Overview

This repository provides a comprehensive evaluation of voice cloning models based on objective speech quality metrics. Our goal is to assess the effectiveness of these models in generating high-quality, intelligible, and natural-sounding voices.

πŸ“ˆ Models Evaluated

Model PESQ STOI MCD Pitch Corr Spec Conv Energy Ratio SNR (dB)
OpenVoice 1.165 0.136 37.988 -0.027 3.475 12.305 -11.193
CoquiTTS 1.727 0.143 203.193 0.012 6.675 45.896 -16.717
F5-TTS 1.782 0.171 174.265 0.060 6.082 39.209 -16.065
E2-TTS 2.281 0.165 158.578 -0.051 5.760 34.939 -15.551

πŸ”¬ Evaluation Metrics

  • PESQ (Perceptual Evaluation of Speech Quality): Measures speech quality, with values ranging from -0.5 to 4.5 (higher is better).
  • STOI (Short-Time Objective Intelligibility): Assesses how well the synthesized voice is understood (range: 0 to 1, higher is better).
  • MCD (Mel Cepstral Distortion): Lower values indicate more accurate voice cloning.
  • Pitch Correlation: Measures how closely the pitch matches the original speaker (closer to 1 is better).
  • Spectral Convergence (Spec Conv): Evaluates how well spectral features align (lower is better).
  • Energy Ratio: Assesses energy distribution in frequency bands.
  • SNR (Signal-to-Noise Ratio in dB): Higher values indicate cleaner, more natural output.

🌐 Test the Models

You can experiment with each model using the provided links:

🌁 Summary & Recommendations

  • Best for natural voice quality: E2-TTS (highest PESQ, lowest MCD).
  • Best for intelligibility: F5-TTS (highest STOI score).
  • Moderate performance: CoquiTTS (balanced results but high spectral distortion).
  • Least recommended: OpenVoice (low PESQ, less realistic output).

πŸ”— References

πŸ“š License

This project is licensed under the Apache 2.0 - see the LICENSE file for details.

About

🎀 Voice Cloning Model Evaluation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •