Created by Alperen Sümeroğlu — An AI-native video engine that turns long-form content into short, viral-ready clips with surgical precision.
ai-clips-maker
is a smart, modular Python tool built for creators, educators, and developers. It transcribes speech, detects speakers, analyzes scenes, and crops around the key moments — creating ready-to-share vertical clips for TikTok, Reels, and Shorts with zero manual editing.
- 📦 Features
- 🛠 Installation
- 🚀 Quickstart
- 🔍 How It Works
- ⚙️ Tech Stack
- 🎯 Use Cases
- 🧪 Tests
- 🗺 Roadmap
- 🤝 Contribute
- 👤 Author
- 🎧 Weekly Rewind Podcast
- 📄 License
- 🎞️ Auto-segment videos based on speech & scene shifts
- 🧠 Word-level transcription using WhisperX
- 🗣️ Speaker diarization (who spoke when) via Pyannote
- 🪄 Face/body-aware cropping focused on active speaker
- 📐 Output formats: 9:16 (vertical), 1:1 (square), 16:9 (wide)
- 🔌 Modular and easily extensible pipeline
# Install main package
pip install ai-clips-maker
# Install WhisperX from source
pip install git+https://github.com/m-bain/whisperx.git
# Install dependencies
# macOS
brew install libmagic ffmpeg
# Ubuntu/Debian
sudo apt install libmagic1 ffmpeg
from ai_clips_maker import Transcriber, ClipFinder, resize
# Step 1: Transcription
transcriber = Transcriber()
transcription = transcriber.transcribe(audio_file_path="/path/to/video.mp4")
# Step 2: Clip detection
clip_finder = ClipFinder()
clips = clip_finder.find_clips(transcription=transcription)
print(clips[0].start_time, clips[0].end_time)
# Step 3: Cropping & resizing
crops = resize(
video_file_path="/path/to/video.mp4",
pyannote_auth_token="your_huggingface_token",
aspect_ratio=(9, 16)
)
print(crops.segments)
- 🎧 Extracts audio from video
- ✍️ Transcribes speech using WhisperX
- 🧍 Identifies speakers with Pyannote
- 🎬 Detects scene changes & speaker shifts
- 🎯 Crops video around active speaker’s position
- 📤 Exports clips in desired format
🔧 Module | 🧠 Technology | 💡 Purpose |
---|---|---|
Transcription | WhisperX | Word-level speech-to-text with timestamps |
Diarization | Pyannote.audio | Speaker segmentation (who spoke when) |
Video Processing | OpenCV, PyAV | Frame-by-frame video control |
Scene Detection | Scenedetect | Detects shot boundaries |
ML Inference | PyTorch | Powering WhisperX & Pyannote models |
Data Handling | NumPy, Pandas | Transcription & clip structuring |
Media Utilities | ffmpeg, libmagic | Media decoding + type detection |
Testing Framework | pytest | End-to-end and unit testing support |
All tools were selected for speed, flexibility, and production-grade stability.
- 🎙 Podcasters clipping episodes into shareable highlights
- 📚 Teachers summarizing lecture content
- 📱 Social media teams repurposing YouTube for Reels
- 🧠 Developers automating video workflows
- 🚀 Startups building AI-based content tools
# Run test suite
pytest tests/
Covers all components: transcriber, diarizer, clip detector, resizer.
Status | Feature | Note |
---|---|---|
✅ | Core pipeline: Transcribe → Diarize → Detect | Implemented in v1.0 |
✅ | Speaker-aware video cropping | Production ready |
🚧 | Multi-language subtitle generation | Planned for Q2 2025 |
📌 | Auto-caption overlay | In design phase |
🧪 | Web UI (upload + preview clips) | Prototype in progress |
🧠 | HuggingFace or Streamlit live demo | On backlog |
We welcome pull requests, ideas, and feedback.
# Fork the repo
git clone https://github.com/alperensumeroglu/ai-clips-maker.git
cd ai-clips-maker
# Create feature branch
git checkout -b feat/your-feature
# Make changes, commit, and push
git commit -am "Add feature"
git push origin feat/your-feature
Before contributing, please review open issues and coding style guide.
Alperen Sümeroğlu
Computer Engineer • Entrepreneur • World Explorer 🌍
15+ European countries explored
“Let your code tell your story — clean, powerful, and useful.”
🎤 Weekly insights on AI, tech, and building globally — by Alperen Sümeroğlu.
🚀 What does it take to grow as a Computer Engineering student, build projects, and explore global innovation?
This API is part of a bigger journey I share in Weekly Rewind — my real-time documentary podcast series, where I reflect weekly on coding breakthroughs, innovation insights, startup stories, and lessons from around the world.
A behind-the-scenes look at real-world experiences, global insights, and hands-on learning. Each episode includes:
- 🔹 Inside My Coding & Engineering Projects
- 🔹 Startup Ideas & Entrepreneurial Lessons
- 🔹 Trends in Tech & AI
- 🔹 Innovation from 15+ Countries
- 🔹 Guest Conversations with Builders & Engineers
- 🔹 Productivity, Learning & Growth Strategies
🎧 Listen now:
“True learning isn’t in tutorials — it’s in building, exploring, and reflecting.”
MIT License — Free for commercial and personal use.
© 2024 Alperen Sümeroğlu