A comprehensive learning path for building, compressing, evaluating, and deploying efficient AI models. From fundamentals to advanced techniques, this course combines theoretical knowledge with practical exercises. Perfect for students, engineers, and researchers looking to master efficient AI development.
- 📚 Lectures - Comprehensive slides and materials
- 💻 Exercises - Hands-on coding practice
- ⚙️ Setup - Environment configuration
- 🤝 Community - Connect with other learners
The lecture content is based on multiple sources (incl. papers, books, and lectures). You can find the main sources in the Awesome AI efficiency repository. If you find it helpful, please ⭐ star the repository!
Topic | Description | Slides |
---|---|---|
Introduction | Introduction to efficient AI | slides |
Architectures for LLMs | Model design and optimization | slides |
Evaluation for LLMs | Performance metrics and analysis | slides |
Compression for LLMs | Model size reduction techniques | slides |
Quantization for LLMs | Precision optimization | slides |
Finetuning for LLMs | Model adaptation strategies | slides |
💡 Tip: Access the most recent version of the lecture materials through this URL.
Located in exercises/
and solutions/
directories, our hands-on modules include:
Exercise | Description | Exercise Notebook | Solution Notebook |
---|---|---|---|
Core Exercises | |||
🔍 Analyze LLM architectures | Study model design patterns and optimization techniques | notebook | solution |
📊 Measure LLM efficiency | Evaluate model performance and resource usage | notebook | solution |
⚖️ Run LLM on CPU vs GPU | Compare usage of CPU and GPU for LLM inference | notebook | solution |
🔢 Benchmark LLM Quantization methods | Analyze impact of different quantization methods | notebook | solution |
Advanced Topics | |||
🚀 Benchmark LLM bit precision | Analyze impact of different bit precisions | notebook | solution |
📈 Use data during quantization | Leverage calibration data for better quantization | notebook | solution |
🎯 Finetune compressed models | Adapt quantized models for specific tasks | notebook | solution |
You can easily setup your coding environment using UV, a modern Python package manager. Most exercises are based on the pruna
package for productive exploration of efficient AI topics.
Further, some exercises require the pruna_pro
package to address more advanced topics.
bash setup_exercises.sh
# Install UV if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.cargo/env
# Setup the project
uv python install 3.10
uv sync
uv add pruna_pro==0.2.2.post1 --index-url https://prunaai.pythonanywhere.com/simple/
# Activate the environment
source .venv/bin/activate
-
Hugging Face Integration: Set your Hugging Face access token as an environment variable so you can download models and datasets.
export HF_TOKEN=your_huggingface_token
You can find or create your token at https://huggingface.co/settings/tokens.
-
Pruna Token (optional): If you want to use advanced features from the
pruna_pro
package, set your Pruna token as an environment variable:export PRUNA_TOKEN=your_pruna_token
You can obtain a token by signing up at https://pruna.ai.
- Minimum: Modest GPU (1080Ti, 2080Ti)
- Ideal: High-end GPU (V100, A100)
- Note: Exercises are optimized for accessibility with 20+ selected small models to work on modest setup.
All notebooks include Google Colab buttons for free GPU access. Click the "Open in Colab" button on any notebook to get started.
Free Tier: Tesla T4/K80/P100 GPUs, 12GB RAM, limited hours/day Colab Pro ($9.99/month): Priority GPU access, longer runtime, 32GB RAM Colab Pro+ ($49.99/month): A100 GPUs, maximum runtime, 52GB RAM
💡 Tip: Use Runtime → Change runtime type → GPU for best performance
Connect with us across platforms:
⭐ Support the Project: If you find these resources valuable, please star this repository and the Awesome AI efficiency collection!