Skip to content

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

Notifications You must be signed in to change notification settings

sugarcane-mk/finetuning_wav2vec2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Fine-tuning Wav2Vec2 for Tamil Speech Recognition

This repository contains the Jupyter Notebook and resources for fine-tuning the Wav2Vec2 model for Tamil speech recognition using the Hugging Face Transformers library.

Table of Contents

Introduction

Wav2Vec2 is a state-of-the-art model for automatic speech recognition (ASR). This project aims to adapt Wav2Vec2 for the Tamil language, leveraging available datasets to improve performance in recognizing spoken Tamil.

Requirements

To run this project, ensure you have the following installed:

  • Python 3.7 or higher
  • Jupyter Notebook
  • PyTorch
  • Transformers
  • Datasets
  • Librosa
  • Soundfile
  • CUDA

You can install the required packages using the following command:

pip install -r requirements.txt

Dataset

We use Tamil Speech Dataset for fine-tuning the model. The dataset consists of audio files in Tamil along with their transcriptions. Please ensure you download the dataset and place it in an accessible directory. Refer datapreprocessing.py

Training

To fine-tune the Wav2Vec2 model, open the Jupyter Notebook and follow the instructions provided within the notebook to execute the training process.

Inference

After training, you can perform inference using the code snippets provided in the Jupyter Notebook. Ensure to replace the paths with your specific audio files.

Results

The performance of the model can be evaluated using standard metrics such as Word Error Rate (WER). The notebook contains sections on evaluating the model's performance.

pip install jiwer
import jiwer

original_transcript = "God is great"  # Example script replace with your transcription
output_transcription = "good is great"

# Compute WER
wer = jiwer.wer(reference, hypothesis)
print(f"Word Error Rate (WER): {wer:.2f}")

Acknowledgments

For further reference please visit: Fairseq Wav2Vec2

About

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published