Voice Separation Tool

A powerful speaker diarization and voice separation tool that automatically detects and isolates different speakers from audio files. This tool handles varied speaking styles, emotional states, and voice modulations from the same speaker.

Features

Advanced Voice Activity Detection: Identifies speech segments using neural and traditional methods
Speaker Embedding Extraction: Uses state-of-the-art d-vectors and Wav2Vec2 embeddings
Automatic Speaker Detection: Determines the optimal number of speakers without manual input
Voice Variation Handling: Recognizes the same speaker across different voice characteristics
Acoustic Analysis: Analyzes pitch, formants, and spectral properties for accurate speaker identification
Visualization: Generates visual timelines of speaker activity

Installation

Prerequisites

Python 3.7+
ffmpeg (for audio format conversion)

Setup

Clone the repository:

git clone https://github.com/subhashdasyam/voice-separation.git
cd voice-separation

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Basic Usage

python main.py input_audio.mp3 --output-format mp3

This will:

Detect and separate different speakers in the audio file
Save each speaker's voice to separate files (voice1.mp3, voice2.mp3, etc.)
Generate visualization of the diarization results

Advanced Options

python main.py input_audio.mp3 --output-format mp3 --device cuda --skip-diarization

HF Token needed

export HF_TOKEN=<GET READ ONLY HUGGING FACE TOKEN>

Command-Line Arguments

input_file: Path to input audio file (mp3 or wav)
--output-dir: Directory to save separated voice files (default: 'output')
--output-format: Output file format, either 'wav' or 'mp3' (default: 'wav')
--device: Device to run models on ('cuda' or 'cpu')
--model-dir: Directory to save/load models (default: 'models')
--no-visualize: Disable visualization generation
--diarization-timeout: Timeout for neural diarization in seconds (default: 300)
--skip-diarization: Skip neural diarization and use clustering directly
--disable-refinement: Disable cluster refinement to preserve initial speaker count
--min-speakers: Minimum number of speakers to consider in clustering (default: 2)
--max-speakers: Maximum number of speakers to consider in clustering (default: 8)
--debug: Enable debug logging

How It Works

Voice Activity Detection
- Uses PyAnnote neural VAD model with WebRTC VAD as fallback
- Identifies all speech segments in the audio
Feature Extraction
- Extracts d-vector embeddings (voice prints) using Resemblyzer
- Falls back to Wav2Vec2 or MFCC features if needed
Speaker Diarization
- Uses either pretrained diarization model or custom clustering
- Automatically determines optimal number of speakers
Cluster Refinement
- Analyzes acoustic properties (pitch, formants) to validate speaker identity
- Merges clusters that likely belong to the same speaker with different voice characteristics
Audio Separation
- Separates and saves each speaker's segments as individual files
- Adds appropriate silence between segments for natural listening

Troubleshooting

Common Issues

Too Few Speakers Detected:
- Use --disable-refinement to prevent merging speaker clusters
- Specify minimum speakers with --min-speakers 4 if you know there are at least 4 speakers
- Try --skip-diarization to use the custom clustering approach
Diarization Takes Too Long:
- Use --skip-diarization to bypass neural diarization
- Adjust timeout with --diarization-timeout 180 (3 minutes)
Out of Memory Errors:
- Use --device cpu to process on CPU instead of GPU
Spectral Clustering Errors:
- Update spectralcluster library: pip install spectralcluster --upgrade
- The code includes fallbacks if spectral clustering fails

License

This project is licensed under the Apache License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
output		output
src		src
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Separation Tool

Features

Installation

Prerequisites

Setup

Usage

Basic Usage

Advanced Options

HF Token needed

Command-Line Arguments

How It Works

Troubleshooting

Common Issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

subhashdasyam/voice-separation

Folders and files

Latest commit

History

Repository files navigation

Voice Separation Tool

Features

Installation

Prerequisites

Setup

Usage

Basic Usage

Advanced Options

HF Token needed

Command-Line Arguments

How It Works

Troubleshooting

Common Issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages