Eval-STT is an open-source evaluation framework for benchmarking Speech-to-Text (STT) models. It provides a structured approach to compare transcription models based on key performance metrics, ensuring fair and transparent evaluations.
- 🎯 Word Error Rate (WER) – Measures transcription accuracy.
- ⚡ Real-Time Factor (RTF) – Evaluates processing speed.
- ⏳ Latency – Assesses transcription delay.
- 💻 CPU Utilization – Tracks computational efficiency on CPU.
- 🎮 GPU Utilization – Measures GPU resource usage.
- 🚀 Speed – Compares performance across different hardware configurations.
Eval-STT supports a range of open-source STT models across different architectures:
whisper-tiny
whisper-base
whisper-small
whisper-medium
whisper-large-v2
wav2vec2-base
wav2vec2-large
wav2vec2-english
wav2vec2-xlsr-en
hubert-large
git clone https://github.com/build-ai-applications/Eval-STT
cd Eval-STT
Ensure you have the required dependencies installed:
pip install -r requirements.txt
The evaluation is automated via a Jupyter Notebook:
jupyter notebook evaluate_stt.ipynb
Modify the STTEvaluator
class inside the notebook to include additional models. Example:
import torch
class STTEvaluator:
def __init__(self, device='cuda' if torch.cuda.is_available() else 'cpu'):
self.device = device
self.models = {
'whisper-large-v2': ('openai/whisper-large-v2', self._load_whisper, self._transcribe_whisper),
'wav2vec2-base': ('facebook/wav2vec2-base-960h', self._load_wav2vec2, self._transcribe_wav2vec2)
}
self.results = {}
Extend the dictionary with more models as needed! 🚀
Our framework provides detailed performance insights:
- ✅ Accuracy (WER) – Lower is better.
- ⚡ Speed (RTF, Latency) – Lower is better.
- 💡 Resource Efficiency (CPU/GPU Utilization) – Lower means more cost-effective deployment.
Results are logged and visualized automatically in the notebook! 📊🚀
Eval-STT is open-source, and we welcome contributions from the community! 🎉
- Fork the repository.
- Implement model loading and transcription functions.
- Update the evaluation notebook.
- Submit a pull request (PR) with a description of changes.
Your contributions help make Eval-STT the best open-source Speech-to-Text evaluation framework! 🚀💡
- Artificial Analysis: Speech-to-Text
- NVIDIA NeMo ASR Metrics
- IEEE STT Evaluation Paper
- German STT Evaluation GitHub
- Coqui-AI STT Models
- ESPnet STT Recipe
Eval-STT is open-source under the Apache 2.0. Use it freely and contribute to make it better! 🚀
For any questions, suggestions, or feature requests, open an issue or reach out to the maintainers! 💡