Skip to content

[BUG] OpenAI Transcription response for verbose JSON disregards words timestamp granularity #518

@jemelyah

Description

@jemelyah

Basic checks

  • I searched existing issues - this hasn't been reported
  • I can reproduce this consistently
  • This is a RubyLLM bug, not my application code

What's broken?

When using OpenAI's word-level timestamp granularities feature, the words field from the API response was silently ignored. Users couldn't access word-level timing data even though OpenAI's API was returning it.

How to reproduce

  1. Call transcribe with timestamp_granularities
  transcription = RubyLLM.transcribe(
    'audio.wav',
    model: 'whisper-1',
    provider: :openai,
    timestamp_granularities: ['word'],
    response_format: 'verbose_json'
  )
  1. OpenAI API returns response with 'words' array containing:
[
  {"word": "Hello", "start": 0.0, "end": 0.5},
  {"word": "world", "start": 0.6, "end": 1.0}
]
  1. Try to access the words data
transcription.words
# => undefined method `words' for an instance of RubyLLM::Transcription

The Transcription class was missing:

  • words attribute reader
  • @words instance variable assignment in initializer

Expected behavior

When calling RubyLLM.transcribe with timestamp_granularities: ['word'], the returned Transcription object should provide access to word-level timing data via the words attribute:

  transcription = RubyLLM.transcribe(
    'audio.wav',
    model: 'whisper-1',
    provider: :openai,
    timestamp_granularities: ['word'],
    response_format: 'verbose_json'
  )

  transcription.words
  # => [
  #   {"word" => "Hello", "start" => 0.0, "end" => 0.5},
  #   {"word" => "world", "start" => 0.6, "end" => 1.0}
  # ]

What actually happened

The words data is silently dropped and inaccessible:

  transcription.words
  # => undefined method `words' for an instance of RubyLLM::Transcription

The OpenAI API successfully returns the words array in the response, but:

  1. The Transcription class has no words attribute reader
  2. The @words instance variable is never assigned
  3. The data is lost even though the provider attempts to pass it

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions