Performance degradation and excessive punctuation in real-time transcription after multiple calls #29

Nvn2708 · 2025-08-15T21:39:12Z

Nvn2708
Aug 15, 2025

Hi Speechmatics team,

In my Angular application(version 19), I’m using the @speechmatics/real-time-client library for real-time transcription.

My setup:

I run two simultaneous transcription connections per call:

Agent audio: from microphone input

Customer audio: from my own AudioCaptureService (remote audio stream)

Each call:

I click Start Listening → startListening() method starts both connections

After the call, I click Stop Listening → stopRecognition() stops both connections

The issue:
After multiple calls (start → stop → start again), transcription performance drops noticeably:

Customer voice starts producing many small single-word chunks instead of full sentences

A lot of unnecessary commas and periods are added (even mid-sentence)

Overall transcription quality decreases compared to the first call

It feels like either the audio pipeline or the transcription session isn't resetting cleanly between calls.

Questions:

Is there a recommended way to fully reset the RealtimeClient between calls to avoid degraded performance?

Is there any configuration tweak (e.g., enable_partials, end_of_utterance_silence_trigger, punctuation settings) that can reduce the excessive punctuation and tiny text chunks for the customer audio?

Could my double-connection setup be affecting transcription quality?

Code sample (simplified):

async startListening(connectionString: string, language: string) {
const transcriptionConfig: RealtimeTranscriptionConfig = {
transcription_config: {
language,
enable_partials: false,
enable_entities: true,
punctuation_overrides: {
permitted_marks: [".", ",", "?", "!"],
sensitivity: 0.4
},
operating_point: 'enhanced',
max_delay: 0.7,
conversation_config: {
end_of_utterance_silence_trigger: 0.5
}
},
audio_format: {
type: 'raw',
encoding: 'pcm_s16le',
sample_rate: 16000,
}
};

await Promise.all([
this.agentTranscriber.start(connectionString, transcriptionConfig),
this.customerTranscriber.start(connectionString, transcriptionConfig)
]);
}

async stopRecognition() {
await Promise.all([
this.agentTranscriber.stopRecognition(),
this.customerTranscriber.stopRecognition()
]);
}

Any advice on resetting sessions cleanly or improving transcription stability between calls would be appreciated.

mnemitz · 2025-08-27T13:19:30Z

mnemitz
Aug 27, 2025
Collaborator

Hi, sorry to hear you're experiencing these issues, and thanks for providing all the details.

I'm not super familiar with your specific stack, but in other browser-based projects I have noticed similar quality issues on session restarts in cases where the audio stream from the first session wasn't cleaned up. In other words, my suspicion is that the same audio is being "duplicated" in the second session, possibly due to the recording being started again without having been stopped the first time.

This is just a hunch though, so I have a couple clarifying questions to help diagnose the issue:

Are you using our browser-audio-input package for the microphone recording? Or do you have a different solution?
Do you notice similar quality issues for both customer and agent audio, or do they differ?
Is it possible that the recording on either one isn't stopped when the session stops, or have you ruled that out?

Let me know if you have any other ideas as well!

0 replies

Nvn2708 · 2025-09-15T15:15:58Z

Nvn2708
Sep 15, 2025
Author

Hello @mnemitz ,

I’m not using browser-audio-input. Instead, I capture audio using my own setup: AudioContext, AudioWorkletNode with a custom pcm-processor, and a MediaStream for the microphone.
I’ve noticed similar quality issues for both agent and customer audio, but the customer (If this is an customer voice support then I am an agent-which I am here to help and you are a customer that you are explaining about your problem) side seems to degrade more quickly after multiple start/stop cycles.
Occasionally,new RealtimeClient() doesn’t seem to stop fully — even though I call this.customerTranscriber.stopRecognition() and this.agentTranscriber.stopRecognition(). It feels like the underlying connection or audio pipeline isn’t being released properly, so when I start again, the new session overlaps with the old one.

1 reply

mnemitz Sep 15, 2025
Collaborator

Thanks for the detailed response. Calling stopRecognition will definitely close the Websocket connection, but it doesn't do anything to manage the input audio stream itself, as this depends on your audio capturing solution. I'm guessing whatever audio capturing is going on is not being cancelled.

Can you confirm your code stops the audio recording at the same point as stopRecognition is called?

If that's not the issue, let me know, and I'll investigate further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speechmatics

Performance degradation and excessive punctuation in real-time transcription after multiple calls #29

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Speechmatics

Performance degradation and excessive punctuation in real-time transcription after multiple calls #29

Uh oh!

Nvn2708 Aug 15, 2025

Replies: 2 comments · 1 reply

Uh oh!

mnemitz Aug 27, 2025 Collaborator

Uh oh!

Nvn2708 Sep 15, 2025 Author

Uh oh!

mnemitz Sep 15, 2025 Collaborator

Nvn2708
Aug 15, 2025

Replies: 2 comments 1 reply

mnemitz
Aug 27, 2025
Collaborator

Nvn2708
Sep 15, 2025
Author

mnemitz Sep 15, 2025
Collaborator