Welcome to the Open-Source LLM Zoomcamp, where we'll explore how to build, tune, and deploy large language models together. We'll be using AMD's MI300x GPUs (hosted on Saturn Cloud) to learn hands-on with open-source LLMs.
Join Slack • #course-open-source-llm Channel on Slack • Telegram Announcements Channel • FAQ • Tweet about the Course
This course might be a good fit if you:
- Are an ML practitioner wanting to dive deeper into open-source LLM stacks
- Have a software engineering background and want to get hands-on with LLMs
- Are a researcher or open-source enthusiast interested in reproducible ML
- Work in MLOps and want to explore AMD's ROCm ecosystem
- Course overview
- Overview of open-source AI ecosystem
- Intro to Large Language Models (LLMs)
- Hugging Face and different LLMs
- Environment setup
- Introduction to ROCm and AMD GPUs
- ROCm vs CUDA
- Setting up Saturn Cloud for ROCm + MI300x
- Running DeepSeek R1 (tutorial)
- Build a simple Streamlit chat app
- Serving LLMs with vLLM
- Homework: Run and serve an LLM on Saturn Cloud
- Fine-tuning concepts
- Llama Factory workflow
- Using Llama Factory for fine-tuning
- Preparing a dataset
- Fine-tuning DeepSeek R1 (tutorial)
- Improving the chatbot from module 1
- Bonus: text-to-image models
- Homework: Fine-tune a model
You'll find a dataset you're interested in and fine-tune an open-source LLM for a specific domain (e.g. legal documents, medical data, or technical documentation) and deploy it so others can use it.
We're starting in 2025! Sign up here to join us.
- Course Channel on DTC Slack
- Telegram Channel with Announcements
- Pre-launch Q&A Stream
- Launch Stream with Course Overview
- Course Google Calendar
- FAQ
- Course Playlist
DataTalks.Club is a community of data enthusiasts learning and growing together. We're all about sharing knowledge, helping each other out, and making data science more accessible.
Join us: • Website • Slack Community • Newsletter • Events • Calendar • YouTube • GitHub • LinkedIn • Twitter