|
11 | 11 | "# Groq Whisper Instagram Reel Subtitler\n", |
12 | 12 | "This guide will walk you through creating an automated subtitle generator for Instagram Reels using Groq Whisper. The script extracts audio from a video, transcribes it using Groq's Whisper API, and overlays word by word subtitles onto the video.\n", |
13 | 13 | "\n", |
14 | | - "Example video output:\n", |
| 14 | + "Example video output: [example_video_output.mp4](example_video_output.mp4)\n", |
15 | 15 | "\n", |
16 | | - "<video controls width=\"300\" height=\"auto\" src=\"final.mp4\" title=\"Example final video\"></video>\n", |
17 | | - "\n", |
18 | | - "## How It Works\n", |
19 | | - "\n", |
20 | | - "Technologies Used\n", |
21 | | - "- Groq Whisper: AI-powered speech-to-text transcription.\n", |
| 16 | + "## Technologies Used\n", |
| 17 | + "- [Groq Whisper Large V3 Turbo:](https://console.groq.com/docs/speech-to-text) AI-powered speech-to-text transcription with word level time stamps.\n", |
22 | 18 | "- MoviePy: Handles video and subtitle overlaying.\n", |
23 | 19 | "- Python OS Module: Manages file paths.\n", |
24 | 20 | "\n", |
|
124 | 120 | " with open(mp3_file, \"rb\") as file:\n", |
125 | 121 | " transcription = client.audio.transcriptions.create(\n", |
126 | 122 | " file=(mp3_file, file.read()),\n", |
127 | | - " model=\"whisper-large-v3-turbo\",\n", |
| 123 | + " model=\"whisper-large-v3-turbo\", # Alternatively, use \"distil-whisper-large-v3-en\" for a faster and lower cost (English-only)\n", |
128 | 124 | " timestamp_granularities=[\"word\"], # Word level time stamps\n", |
129 | 125 | " response_format=\"verbose_json\",\n", |
130 | 126 | " language=\"en\",\n", |
|
145 | 141 | "metadata": {}, |
146 | 142 | "source": [ |
147 | 143 | "# Step 5: Overlay Subtitle Clips\n", |
148 | | - "From the previous function, we'll recieve a JSON file that contains timestamped segments of words. With these word segments, we'll loop through them and create TextClips to be put into the video at the correct time." |
| 144 | + "From the previous function, we'll recieve a JSON that contains timestamped segments of words. With these word segments, we'll loop through them and create TextClips to be put into the video at the correct time.\n", |
| 145 | + "\n", |
| 146 | + "Example of the JSON you would recieve that we'll iterate through:\n", |
| 147 | + "```\n", |
| 148 | + "[\n", |
| 149 | + " {'word': 'This', 'start': 0.1, 'end': 0.28},\n", |
| 150 | + " {'word': 'month', 'start': 0.28, 'end': 0.56},\n", |
| 151 | + " {'word': 'I', 'start': 0.56, 'end': 0.78},\n", |
| 152 | + " {'word': 'traveled', 'start': 0.78, 'end': 1.12},\n", |
| 153 | + " {'word': 'to', 'start': 1.12, 'end': 1.38}\n", |
| 154 | + "...\n", |
| 155 | + "```" |
149 | 156 | ] |
150 | 157 | }, |
151 | 158 | { |
|
231 | 238 | "```\n", |
232 | 239 | "\n", |
233 | 240 | "## Troubleshooting errors:\n", |
234 | | - "- Make sure to have a video file ready before running the script\n", |
235 | | - "- Make sure the path to the file is correct\n", |
236 | | - "- Make sure you have a Groq API key in your .env file" |
| 241 | + "- On MacOS, playing audio within VSCode versus opening up the video in Finder uses different audio encoding outputs. Adding `audio_codec=\"aac\"` to the output line `final_clip.write_videofile(\"final.mp4\", codec=\"libx264\", audio_codec=\"aac\")` will allow you to hear audio on playback in MacOS Finder. But without it, you will only be able to hear the audio file from within VSCode and not from the Finder." |
237 | 242 | ] |
| 243 | + }, |
| 244 | + { |
| 245 | + "cell_type": "markdown", |
| 246 | + "metadata": {}, |
| 247 | + "source": [] |
238 | 248 | } |
239 | 249 | ], |
240 | 250 | "metadata": { |
|
0 commit comments