Skip to content

Commit 8dcd054

Browse files
authored
Updated Instagram Reel Captioner Cookbook (groq#69)
1 parent 2bbf2e0 commit 8dcd054

File tree

5 files changed

+110
-12
lines changed

5 files changed

+110
-12
lines changed
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
import os
2+
from groq import Groq
3+
import datetime
4+
from moviepy import *
5+
from moviepy.video.tools.subtitles import SubtitlesClip
6+
from moviepy.video.io.VideoFileClip import VideoFileClip
7+
from dotenv import load_dotenv
8+
load_dotenv()
9+
10+
GROQ_API_KEY = os.environ["GROQ_API_KEY"]
11+
client = Groq(api_key=GROQ_API_KEY)
12+
13+
14+
def convert_mp4_to_mp3(mp4_filepath, mp3_file):
15+
"""
16+
Converts an MP4 file to MP3.
17+
18+
Args:
19+
mp4_filepath: Path to the input MP4 file.
20+
mp3_filepath: Path to save the output MP3 file.
21+
"""
22+
video_clip = VideoFileClip(mp4_filepath)
23+
24+
# Extract audio from video
25+
video_clip.audio.write_audiofile(mp3_file)
26+
print("now is an mp3")
27+
video_clip.close()
28+
29+
# Step 1: Transcribe Audio
30+
def transcribe_audio(mp3_file):
31+
32+
# Open the audio file
33+
with open(mp3_file, "rb") as file:
34+
# Create a transcription of the audio file
35+
transcription = client.audio.transcriptions.create(
36+
file=(mp3_file, file.read()), # Required audio file
37+
model="whisper-large-v3-turbo", # Required model to use for transcription
38+
timestamp_granularities=["word"],
39+
response_format="verbose_json", # Optional
40+
language="en", # Optional
41+
temperature=0.0 # Optional
42+
)
43+
# Print the transcription text
44+
print(transcription.words)
45+
return transcription.words
46+
47+
def add_subtitles(verbose_json, width, fontsize):
48+
text_clips = []
49+
50+
for segment in verbose_json:
51+
text_clips.append(
52+
TextClip(text=segment["word"],
53+
font_size=fontsize,
54+
stroke_width=5,
55+
stroke_color="black",
56+
font="./Roboto-Condensed-Bold.otf",
57+
color="white",
58+
size=(width, None),
59+
method="caption",
60+
text_align="center",
61+
margin=(30, 0)
62+
)
63+
.with_start(segment["start"])
64+
.with_end(segment["end"])
65+
.with_position("center")
66+
)
67+
return text_clips
68+
69+
# Run the Process
70+
video_file = "../input.mp4"
71+
output_file = "output_with_subtitles.mp4"
72+
73+
# Loading the video as a VideoFileClip
74+
original_clip = VideoFileClip(video_file)
75+
width = original_clip.w
76+
print(width)
77+
78+
mp3_file = "../output.mp3"
79+
convert_mp4_to_mp3(video_file, mp3_file)
80+
segments = transcribe_audio(mp3_file)
81+
text_clip_list = add_subtitles(segments, width, fontsize=40)
82+
83+
# Create a CompositeVideoClip that we write to a file
84+
final_clip = CompositeVideoClip([original_clip] + text_clip_list)
85+
86+
final_clip.write_videofile("final.mp4", codec="libx264") # Mac users may want to add this within the parentheses: ,audio_codec="aac"
87+
print("Subtitled video saved as:", output_file)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
GROQ_API_KEY="groq-api-key-value"
5.52 MB
Binary file not shown.
-12.3 MB
Binary file not shown.

tutorials/instagram-reel-subtitler/subtitler-tutorial.ipynb

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,10 @@
1111
"# Groq Whisper Instagram Reel Subtitler\n",
1212
"This guide will walk you through creating an automated subtitle generator for Instagram Reels using Groq Whisper. The script extracts audio from a video, transcribes it using Groq's Whisper API, and overlays word by word subtitles onto the video.\n",
1313
"\n",
14-
"Example video output:\n",
14+
"Example video output: [example_video_output.mp4](example_video_output.mp4)\n",
1515
"\n",
16-
"<video controls width=\"300\" height=\"auto\" src=\"final.mp4\" title=\"Example final video\"></video>\n",
17-
"\n",
18-
"## How It Works\n",
19-
"\n",
20-
"Technologies Used\n",
21-
"- Groq Whisper: AI-powered speech-to-text transcription.\n",
16+
"## Technologies Used\n",
17+
"- [Groq Whisper Large V3 Turbo:](https://console.groq.com/docs/speech-to-text) AI-powered speech-to-text transcription with word level time stamps.\n",
2218
"- MoviePy: Handles video and subtitle overlaying.\n",
2319
"- Python OS Module: Manages file paths.\n",
2420
"\n",
@@ -124,7 +120,7 @@
124120
" with open(mp3_file, \"rb\") as file:\n",
125121
" transcription = client.audio.transcriptions.create(\n",
126122
" file=(mp3_file, file.read()),\n",
127-
" model=\"whisper-large-v3-turbo\",\n",
123+
" model=\"whisper-large-v3-turbo\", # Alternatively, use \"distil-whisper-large-v3-en\" for a faster and lower cost (English-only)\n",
128124
" timestamp_granularities=[\"word\"], # Word level time stamps\n",
129125
" response_format=\"verbose_json\",\n",
130126
" language=\"en\",\n",
@@ -145,7 +141,18 @@
145141
"metadata": {},
146142
"source": [
147143
"# Step 5: Overlay Subtitle Clips\n",
148-
"From the previous function, we'll recieve a JSON file that contains timestamped segments of words. With these word segments, we'll loop through them and create TextClips to be put into the video at the correct time."
144+
"From the previous function, we'll recieve a JSON that contains timestamped segments of words. With these word segments, we'll loop through them and create TextClips to be put into the video at the correct time.\n",
145+
"\n",
146+
"Example of the JSON you would recieve that we'll iterate through:\n",
147+
"```\n",
148+
"[\n",
149+
" {'word': 'This', 'start': 0.1, 'end': 0.28},\n",
150+
" {'word': 'month', 'start': 0.28, 'end': 0.56},\n",
151+
" {'word': 'I', 'start': 0.56, 'end': 0.78},\n",
152+
" {'word': 'traveled', 'start': 0.78, 'end': 1.12},\n",
153+
" {'word': 'to', 'start': 1.12, 'end': 1.38}\n",
154+
"...\n",
155+
"```"
149156
]
150157
},
151158
{
@@ -231,10 +238,13 @@
231238
"```\n",
232239
"\n",
233240
"## Troubleshooting errors:\n",
234-
"- Make sure to have a video file ready before running the script\n",
235-
"- Make sure the path to the file is correct\n",
236-
"- Make sure you have a Groq API key in your .env file"
241+
"- On MacOS, playing audio within VSCode versus opening up the video in Finder uses different audio encoding outputs. Adding `audio_codec=\"aac\"` to the output line `final_clip.write_videofile(\"final.mp4\", codec=\"libx264\", audio_codec=\"aac\")` will allow you to hear audio on playback in MacOS Finder. But without it, you will only be able to hear the audio file from within VSCode and not from the Finder."
237242
]
243+
},
244+
{
245+
"cell_type": "markdown",
246+
"metadata": {},
247+
"source": []
238248
}
239249
],
240250
"metadata": {

0 commit comments

Comments
 (0)