This repository contains the Jupyter Notebook and resources for fine-tuning the Wav2Vec2 model for Tamil speech recognition using the Hugging Face Transformers library.
Wav2Vec2 is a state-of-the-art model for automatic speech recognition (ASR). This project aims to adapt Wav2Vec2 for the Tamil language, leveraging available datasets to improve performance in recognizing spoken Tamil.
To run this project, ensure you have the following installed:
- Python 3.7 or higher
- Jupyter Notebook
- PyTorch
- Transformers
- Datasets
- Librosa
- Soundfile
- CUDA
You can install the required packages using the following command:
pip install -r requirements.txt
We use Tamil Speech Dataset for fine-tuning the model. The dataset consists of audio files in Tamil along with their transcriptions. Please ensure you download the dataset and place it in an accessible directory. Refer datapreprocessing.py
To fine-tune the Wav2Vec2 model, open the Jupyter Notebook and follow the instructions provided within the notebook to execute the training process.
After training, you can perform inference using the code snippets provided in the Jupyter Notebook. Ensure to replace the paths with your specific audio files.
The performance of the model can be evaluated using standard metrics such as Word Error Rate (WER). The notebook contains sections on evaluating the model's performance.
pip install jiwer
import jiwer
original_transcript = "God is great" # Example script replace with your transcription
output_transcription = "good is great"
# Compute WER
wer = jiwer.wer(reference, hypothesis)
print(f"Word Error Rate (WER): {wer:.2f}")
For further reference please visit: Fairseq Wav2Vec2