Interview Transcriber 🎙️

A powerful Node.js tool to download audio from YouTube, Spotify, or podcast URLs, transcribe using Google Gemini AI, and generate comprehensive summaries with speaker identification and tone analysis.

Features

📥 Download audio from YouTube videos, Spotify episodes/podcasts, or direct MP3 URLs
✂️ Smart chunking - Splits long audio into manageable 10-minute segments
🎯 Advanced transcription using Google Gemini AI with:
- Speaker identification
- Tone/emotion analysis
- Timestamp preservation
🔀 Intelligent merging of transcription chunks
✨ Automatic extraction of:
- Key highlights and themes
- Comprehensive summary
- Speaker statistics
📊 Multiple output formats:
- Structured JSON
- Formatted text transcript
- Detailed metadata report

Prerequisites

Node.js (v16 or higher)
pnpm - Fast, disk space efficient package manager
```
npm install -g pnpm
```
yt-dlp - For YouTube downloads
```
# macOS
brew install yt-dlp

# Ubuntu/Debian
sudo apt install yt-dlp

# Windows
# Download from https://github.com/yt-dlp/yt-dlp/releases
```
Note: Spotify support is handled automatically through the integrated spotify-dl package - no additional installation required.

ffmpeg - For audio processing

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Google Gemini API Key
- Get your API key from: https://makersuite.google.com/app/apikey

Installation

# Clone the repository
git clone <repository-url>
cd audio-transcriber

# Install dependencies
pnpm install

# Copy environment file and add your API key
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Supported Platforms

YouTube

Videos, playlists, channels
Automatic audio extraction
Metadata preservation

Spotify

Episodes and podcasts
Automatically finds matching content on YouTube
Preserves original metadata and structure
No authentication required for most content

Direct MP3 URLs

Any publicly accessible MP3 file
Direct download without conversion

Usage

Basic Usage

# Transcribe a YouTube video
pnpm dev "https://www.youtube.com/watch?v=VIDEO_ID"

# Transcribe a Spotify episode/podcast
pnpm dev "https://open.spotify.com/episode/EPISODE_ID"

# Transcribe a direct MP3 URL
pnpm dev "https://example.com/podcast.mp3"

# With custom output path
pnpm dev "https://www.youtube.com/watch?v=VIDEO_ID" -o ./my-transcript.json

Command Line Options

audio-transcriber <url> [options]

Options:
  -o, --output <path>           Output file path (default: ./output/transcript_[timestamp].json)
  -t, --temp-dir <path>         Temporary directory for processing (default: ./temp)
  -c, --chunk-duration <secs>   Duration of each chunk in seconds (default: 600)
  --concurrency <number>        Number of chunks to process in parallel during transcription (default: 5)
  -k, --keep-chunks             Keep temporary audio chunks after processing
  -s, --save-temp-files         Keep all temporary files including raw audio, downsampled audio, chunks, and intermediate files
  --no-text                     Skip generating text transcript file
  --no-report                   Skip generating metadata report file
  -h, --help                    Display help for command

Additional Commands

# Display dependency information
pnpm dev info

# Run a test transcription
pnpm dev test

# Build the project
pnpm build

# Clean temporary files
pnpm clean

Output Files

The tool generates three types of output files:

1. JSON Output (`transcript_*.json`)

{
  "title": "Video/Audio Title",
  "source_url": "https://...",
  "full_transcript": [
    {
      "start": "00:00:00",
      "end": "00:00:45",
      "speaker": "Speaker 1",
      "tone": "Excited",
      "text": "Transcribed text..."
    }
  ],
  "highlights": ["Key point 1", "Key point 2"],
  "summary": "Comprehensive summary..."
}

2. Text Transcript (`transcript_*.txt`)

A formatted, readable transcript with timestamps, speakers, and tone information.

3. Metadata Report (`transcript_*_report.txt`)

A summary report containing:

Title and source information
Executive summary
Key highlights
Speaker statistics
Tone distribution analysis

Docker Usage

You can run Interview Transcriber in a Docker container for easy, reproducible usage.

Build the Docker image

docker build -t interview-transcriber .

Run the container

# YouTube video
docker run --rm \
  -e GEMINI_API_KEY=your_actual_api_key \
  -v $(pwd)/output:/output \
  interview-transcriber "https://www.youtube.com/watch?v=VIDEO_ID"

# Spotify episode
docker run --rm \
  -e GEMINI_API_KEY=your_actual_api_key \
  -v $(pwd)/output:/output \
  interview-transcriber "https://open.spotify.com/episode/EPISODE_ID"

This will save the transcript in your local output/ directory.
You can also specify a custom output file:

docker run --rm \
  -e GEMINI_API_KEY=your_actual_api_key \
  -v $(pwd)/output:/output \
  interview-transcriber "https://www.youtube.com/watch?v=VIDEO_ID" /output/my-transcript.json

Using a `.env` file for environment variables

Instead of specifying the API key directly, you can store it in a .env file:

# .env
GEMINI_API_KEY=your_actual_api_key

Then run the container with:

# YouTube video
docker run --rm \
  --env-file .env \
  -v $(pwd)/output:/output \
  interview-transcriber "https://www.youtube.com/watch?v=VIDEO_ID"

# Spotify episode
docker run --rm \
  --env-file .env \
  -v $(pwd)/output:/output \
  interview-transcriber "https://open.spotify.com/episode/EPISODE_ID"

Notes

The GEMINI_API_KEY environment variable is required for Google Gemini transcription.
The /output directory inside the container should be mounted to a local directory to access results.
All other CLI options are supported as in the native usage.

API Usage

You can also use the modules programmatically:

import { AudioProcessor } from "audio-transcriber";

const processor = new AudioProcessor();

const options = {
  url: "https://www.youtube.com/watch?v=VIDEO_ID",
  outputPath: "./output/my-transcript.json",
  chunkDuration: 600, // 10 minutes
  concurrency: 5, // Process 5 chunks in parallel
};

const result = await processor.processAudio(options);
console.log(result);

Project Structure

audio-transcriber/
├── src/
│   ├── modules/
│   │   ├── audioDownloader.ts    # YouTube/MP3 download logic
│   │   ├── spotifyDownloader.ts  # Spotify download logic
│   │   ├── audioChunker.ts       # Audio splitting with ffmpeg
│   │   ├── transcriber.ts        # Gemini AI transcription
│   │   ├── merger.ts             # Chunk merging logic
│   │   ├── highlights.ts         # Highlight extraction
│   │   └── outputBuilder.ts      # Output file generation
│   ├── utils/
│   │   ├── timeUtils.ts          # Timestamp utilities
│   │   └── fileUtils.ts          # File system utilities
│   ├── types/
│   │   └── index.ts              # TypeScript interfaces
│   ├── processor.ts              # Main orchestrator
│   ├── cli.ts                    # CLI interface
│   └── index.ts                  # Module exports
├── tests/                        # Test files
├── temp/                         # Temporary processing files
├── output/                       # Default output directory
└── package.json

Error Handling

The tool includes comprehensive error handling for:

Network failures (with retry logic)
Invalid URLs
API rate limiting
File system errors
Corrupted audio files

Performance Considerations

Chunk Duration: Default is 10 minutes. Shorter chunks = more API calls but better accuracy
API Rate Limiting: The tool includes delays between API calls to avoid rate limiting
Parallel Processing: Chunks are processed in parallel with configurable concurrency (default: 5). Higher concurrency = faster processing but may hit API rate limits
Concurrency Control: Use the --concurrency option to adjust parallel processing. Start with 5 for most use cases

Troubleshooting

Common Issues

"GEMINI_API_KEY not found"
- Make sure you've created a .env file with your API key
"yt-dlp not found"
- Install yt-dlp using the instructions above
"ffmpeg not found"
- Install ffmpeg using the instructions above
Transcription fails
- Check your Gemini API quota
- Try reducing chunk duration
- Ensure audio quality is sufficient

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Acknowledgments

Google Gemini AI for transcription capabilities
yt-dlp for YouTube download functionality
spotify-dl by SwapnilSoni1999 for Spotify support
ffmpeg for audio processing
The open-source community for various dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
audio-transcriber		audio-transcriber
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.npmrc		.npmrc
Dockerfile		Dockerfile
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
cookies.txt		cookies.txt
docker-entrypoint.sh		docker-entrypoint.sh
env.template		env.template
fly.toml		fly.toml
jest.config.js		jest.config.js
mise.toml		mise.toml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
run.sh		run.sh
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interview Transcriber 🎙️

Features

Prerequisites

Installation

Supported Platforms

YouTube

Spotify

Direct MP3 URLs

Usage

Basic Usage

Command Line Options

Additional Commands

Output Files

1. JSON Output (`transcript_*.json`)

2. Text Transcript (`transcript_*.txt`)

3. Metadata Report (`transcript_*_report.txt`)

Docker Usage

Build the Docker image

Run the container

Using a `.env` file for environment variables

Notes

API Usage

Project Structure

Error Handling

Performance Considerations

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

The-Focus-AI/interview-transcriber

Folders and files

Latest commit

History

Repository files navigation

Interview Transcriber 🎙️

Features

Prerequisites

Installation

Supported Platforms

YouTube

Spotify

Direct MP3 URLs

Usage

Basic Usage

Command Line Options

Additional Commands

Output Files

1. JSON Output (transcript_*.json)

2. Text Transcript (transcript_*.txt)

3. Metadata Report (transcript_*_report.txt)

Docker Usage

Build the Docker image

Run the container

Using a .env file for environment variables

Notes

API Usage

Project Structure

Error Handling

Performance Considerations

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. JSON Output (`transcript_*.json`)

2. Text Transcript (`transcript_*.txt`)

3. Metadata Report (`transcript_*_report.txt`)

Using a `.env` file for environment variables

Packages