Skip to content

[FEATURE] Add support for known speakers in SpeakerDiarizationConfig #44

@nsepehr

Description

@nsepehr

Which SDK is this feature request for?

  • speechmatics-rt (Real-Time SDK)
  • speechmatics-batch (Batch SDK)
  • Both SDKs
  • General/Repository

Feature Request: Add support for known speakers in SpeakerDiarizationConfig

Summary

Request to add support for known speaker identification/enrollment in the Python SDK's SpeakerDiarizationConfig to enable persistent speaker identification across sessions.

Context

Currently, the SpeakerDiarizationConfig only supports:

  • max_speakers
  • speaker_sensitivity
  • prefer_current_speaker

However, the API documentation mentions SpeakersResult as a preview feature, and the SDK code contains references to GET_SPEAKERS and SPEAKERS_RESULT message types (marked as "Internal, Speechmatics only").

Use Case

We're building voice AI applications where identifying specific speakers across sessions is critical, such as:

  • Meeting transcription: Identifying recurring participants without having to voice match with their speaker labels again

Currently, every time the same speakers in our system join a meeting we have to match their identities to the speaker. Which in the case of diarization becomes very annoying to have to do each time and a bad user experience. This feature to allow them to be identified beforehand would be incredibly useful.

Proposed Solution

Add a speakers field to SpeakerDiarizationConfig to support known speaker enrollment:

@dataclass
class SpeakerDiarizationConfig:
    max_speakers: Optional[int] = None
    speaker_sensitivity: Optional[float] = None
    prefer_current_speaker: Optional[bool] = None
    speakers: Optional[Dict[str, List[str]]] = None  # New field for known speakers

# Usage example:
config = SpeakerDiarizationConfig(
    max_speakers=2,
    speaker_sensitivity=0.5,
    speakers={
        "John": ["speaker_id_john_123"],  # Speaker name -> identifiers
        "Jane": ["speaker_id_jane_456"],
    }
)

Current Workaround

We tested whether the API would accept a speakers field even though it's not in the SDK:

config.speaker_diarization_config = {
    "speakers": {
        "John": ["speaker_id_john_123"],
        "Jane": ["speaker_id_jane_456"],
    }
}

But the API rejects it with:

Error: Additional property speakers is not allowed

Questions

  1. Is the SpeakersResult preview feature available for early access?
  2. Is there a timeline for when known speaker support will be added to the public API?
  3. Would you accept a PR to add this functionality to the SDK once the API supports it?

Related

Environment

  • speechmatics-rt version: 0.4.0
  • Python version: 3.11
  • Use case: Real-time transcription with speaker identification

Would love to hear if this is on the roadmap or if there's an alternative approach we should consider!

Related issues/PRs
Link any related issues or pull requests:

  • Closes #
  • Related to #

Priority/Impact
How important is this feature to you?

  • Critical - blocking current work
  • High - would significantly improve workflow
  • Medium - nice to have improvement
  • Low - minor enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions