Performance degradation and excessive punctuation in real-time transcription after multiple calls #29
Replies: 2 comments 1 reply
-
|
Hi, sorry to hear you're experiencing these issues, and thanks for providing all the details. I'm not super familiar with your specific stack, but in other browser-based projects I have noticed similar quality issues on session restarts in cases where the audio stream from the first session wasn't cleaned up. In other words, my suspicion is that the same audio is being "duplicated" in the second session, possibly due to the recording being started again without having been stopped the first time. This is just a hunch though, so I have a couple clarifying questions to help diagnose the issue:
Let me know if you have any other ideas as well! |
Beta Was this translation helpful? Give feedback.
-
|
Hello @mnemitz ,
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Speechmatics team,
In my Angular application(version 19), I’m using the @speechmatics/real-time-client library for real-time transcription.
My setup:
I run two simultaneous transcription connections per call:
Agent audio: from microphone input
Customer audio: from my own AudioCaptureService (remote audio stream)
Each call:
I click Start Listening → startListening() method starts both connections
After the call, I click Stop Listening → stopRecognition() stops both connections
The issue:
After multiple calls (start → stop → start again), transcription performance drops noticeably:
Customer voice starts producing many small single-word chunks instead of full sentences
A lot of unnecessary commas and periods are added (even mid-sentence)
Overall transcription quality decreases compared to the first call
It feels like either the audio pipeline or the transcription session isn't resetting cleanly between calls.
Questions:
Is there a recommended way to fully reset the RealtimeClient between calls to avoid degraded performance?
Is there any configuration tweak (e.g., enable_partials, end_of_utterance_silence_trigger, punctuation settings) that can reduce the excessive punctuation and tiny text chunks for the customer audio?
Could my double-connection setup be affecting transcription quality?
Code sample (simplified):
async startListening(connectionString: string, language: string) {
const transcriptionConfig: RealtimeTranscriptionConfig = {
transcription_config: {
language,
enable_partials: false,
enable_entities: true,
punctuation_overrides: {
permitted_marks: [".", ",", "?", "!"],
sensitivity: 0.4
},
operating_point: 'enhanced',
max_delay: 0.7,
conversation_config: {
end_of_utterance_silence_trigger: 0.5
}
},
audio_format: {
type: 'raw',
encoding: 'pcm_s16le',
sample_rate: 16000,
}
};
await Promise.all([
this.agentTranscriber.start(connectionString, transcriptionConfig),
this.customerTranscriber.start(connectionString, transcriptionConfig)
]);
}
async stopRecognition() {
await Promise.all([
this.agentTranscriber.stopRecognition(),
this.customerTranscriber.stopRecognition()
]);
}
Any advice on resetting sessions cleanly or improving transcription stability between calls would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions