Identify Attendees Based on Voice #56

tyrwinn · 2025-03-21T20:32:21Z

Hi,

Thanks so much for all the hard work on this project!

I would like to suggest a feature to identify attendees based on their voice from audio recordings.

The idea is to:
• Detect who is speaking
• Match voices to known attendees
• Generate labeled transcripts based on attendee

Possible Approach

The pyannote-whisper project looks like a great starting point that could be incorporated, as it combines:
• whisper for transcription
• pyannote-audio for speaker diarization

This could be adapted to recognize specific attendees and tag their speech in recordings.

sujithatzackriya · 2025-03-22T03:06:17Z

Will add speaker identification once

The ollama generation issue is fixed
The basic db and related logic are better.

ilyamochalov · 2025-04-01T06:45:03Z

@sujithatzackriya I can work on speaker identification. Do you have any other approaches except pyannote-whisper mentioned at this issue description?

sujithatzackriya · 2025-04-01T08:52:27Z

@ilyamochalov

Observations and solutions

I was thinking of using WhisperX based on faster-whisper backend with enhancements like VAD pre processing, word level timestamps and speaker diarization.

https://github.com/m-bain/whisperX

Identified this from initial research some times back. following are the advantages and disadvantages I see

Advantages

Good community support
Uses faster-whisper backend which has some good benchmark scores

Disadvantages

Written in python
Uses more RAM.

We used whisper.cpp to utilize the efficiency of CPP. but moving to python might introduce memory issues.

Alternative solutions

Moving the backend completely to Rust is also something that's to be explored as libraries such as mistral.rs and pyanote.rs, whisper-rs etc are available. but this needs careful research and experimentation.

My thoughts

Since we are using LLMs, identifying the speaker from the transcript is possible. I never had an issue with having no speaker diarization personally. It would be really nice to understand how this can be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify Attendees Based on Voice #56

Identify Attendees Based on Voice #56

tyrwinn commented Mar 21, 2025

sujithatzackriya commented Mar 22, 2025

ilyamochalov commented Apr 1, 2025 •

edited

Loading

sujithatzackriya commented Apr 1, 2025

Identify Attendees Based on Voice #56

Identify Attendees Based on Voice #56

Comments

tyrwinn commented Mar 21, 2025

sujithatzackriya commented Mar 22, 2025

ilyamochalov commented Apr 1, 2025 • edited Loading

sujithatzackriya commented Apr 1, 2025

Observations and solutions

Advantages

Disadvantages

Alternative solutions

My thoughts

ilyamochalov commented Apr 1, 2025 •

edited

Loading