CSE 585: Advanced Scalable Systems for Generative AI (F'25)

Administrivia

Catalog Number: 31041
Lectures/Discussion: 2150 DOW, TTh: 10:30 AM – 12:00 PM
Projects/Makeup: 1500 EECS, F 1:30 PM – 2:30 PM
Counts as: Software Breadth and Depth (PhD); Technical Elective and 500-Level (MS/E)

Team

Member (uniqname)	Role	Office Hours
Mosharaf Chowdhury (mosharaf)	Faculty	4820 BBB. By appointments only.
Jae-Won Chung (jwnchung)	GSI	TBA

Communication

ALL communication regarding this course must be via Ed. This includes questions, discussions, announcements, as well as private messages.

Presentation slides and paper summaries should be emailed to [email protected].

Course Description

This iteration of CSE585 will introduce you to the key concepts and the state-of-the-art in practical, scalable, and fault-tolerant systems for Generative AI (GenAI) and encourage you to think about either building new tools or how to apply the existing ones.

Since datacenters and cloud computing form the backbone of modern computing, we will start with an overview of the two. We will then take a deep dive into systems for the Generative AI landscape, focusing on different types of problems. Our topics will include: basics on generative models from a systems perspective; systems for GenAI lifecycle including pre-training, post-training, and inference serving systems; agentic AI; etc. We will cover GenAI topics primarily from top conferences that take a systems view to the relevant challenges.

Note that this course is NOT focused on AI methods. Instead, we will focus on how one can build systems so that existing AI methods can be used in practice and new AI methods can emerge.

Prerequisites

Students are expected to have good programming skills and must have taken at least one undergraduate-level systems-related course (from operating systems/EECS482, databases/EECS484, distributed systems/EECS491, and networking/EECS489). Having an undergraduate ML/AI course may be helpful, but not required or necessary.

Textbook

This course has no textbooks. We will read recent papers from top venues to understand trends in scalable GenAI and agentic systems, and their applications.

Tentative Schedule and Reading List

This is an evolving list and subject to changes due to the breakneck pace of GenAI innovations.

Date	Readings	Presenter
Aug 26	Introduction	Mosharaf
	How to Read a Paper (Required)
	How to Give a Bad Talk (Required)
	The Datacenter as a Computer (Chapters 1 and 2)
	Machine Learning Fleet Efficiency: Analyzing and Optimizing Large-Scale Google TPU Systems with ML Productivity Goodput
Aug 28	GenAI Basics	Jae-Won
	The Illustrated Transformer (Required)
	The Illustrated Stable Diffusion (Required)
Sep 2	No Lecture: Find Project Groups
	Hints and Principles for Computer System Design (Required)
Sep 4	Training Basics	Jae-Won
	Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM (Required)
Sep 9	No Lecture: Work on Project Proposals
	Writing Reviews for Systems Conferences (Required)
	Worse is Better (Required)
	Pre-Training
Sep 11	Pipeline Parallelism with Controllable Memory (Required)
	Zero Bubble (Almost) Pipeline Parallelism
	WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training (Required)
Sep 16	Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (Required)
	PartIR: Composing SPMD Partitioning Strategies for Machine Learning (Required)
Sep 18	Understanding Stragglers in Large Model Training Using What-if Analysis
	SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation (Required)
	Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks (Required)
Sep 23	Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates (Required)
	Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections (Required)
	Post-Training
Sep 25	Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Required)
	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Required)
Sep 30	HybridFlow: A Flexible and Efficient RLHF Framework (Required)
	AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning (Required)
	Inference
Oct 2	Inference Basics	Jae-Won
	Orca: A Distributed Serving System for Transformer-Based Generative Models (Required)
	Efficient Memory Management for Large Language Model Serving with PagedAttention (Required)
Oct 7	DistServe: Disaggregating Prefill and Decoding for Goodput-Optimized Large Language Model Serving (Required)
	Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve (Required)
Oct 9	NanoFlow: Towards Optimal Large Language Model Serving Throughput (Required)
	Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot (Required)
Oct 14	Fall Study Break
Oct 16	LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism (Required)
	Cornserve (Required)
Oct 21	Mid-Semester Presentations
Oct 23	Mid-Semester Presentations
Oct 28	No Lecture: Work on Projects
Oct 30	PowerInfer: Fast Large Language Model Serving with a Consumer-Grade GPU (Required)
	FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU (Required)
Nov 4	Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services (Required)
	On Evaluating Performance of LLM Inference Serving Systems (Required)
	Tempo: Application-aware LLM Serving with Mixed SLO Requirements
	Agentic Systems
Nov 6	Parrot: Efficient Serving of LLM-based Applications with Semantic Variable (Required)
	Towards End-to-End Optimization of LLM-based Applications with Ayo (Required)
Nov 11	METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation (Required)
	Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining (Required)
	Hardware / Infrastructure
Nov 13	WaferLLM: Large Language Model Inference at Wafer Scale (Required)
	Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures (Required)
	Meta's Second Generation AI Chip: Model-Chip Co-Design and Productionization Experiences
	Ironwood: The first Google TPU for the age of inference
	Power and Energy Management
Nov 18	Reducing Energy Bloat in Large Model Training (Required)
	TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms (Required)
	Kareus
Nov 20	AI Load Dynamics--A Power Electronics Perspective (Required)
	AI Training Load Fluctuations at Gigawatt-scale – Risk of Power Grid Blackout? (Required)
	Ethical Considerations
Nov 25	On the Dangers of Stochastic Parrots: Can Language Models be too Big?🦜 (Required)
	We Need a New Ethics for a World of AI Agents (Required)
Nov 27	No Lecture: Thanksgiving Recess
Dec 2	Wrap Up	Mosharaf
	How to Write a Great Research Paper (Required)
Dec 4	Final Poster Presentations TBA	Template

Policies

Honor Code

The Engineering Honor Code applies to all activities related to this course.

Groups

All activities of this course will be performed in groups of 3-4 students.

Required Reading

Each lecture will have two required reading that everyone must read.
There will be one or more optional related reading(s) that only the presenter(s) should be familiar with. They are optional for the rest of the class.

Student Lectures

The course will be conducted as a seminar. Only one group will present in each class. Each group will be assigned at least one lecture over the course of the semester. Presentations should last at most 40 minutes without interruption. However, presenters should expect questions and interruptions throughout.

In the presentation, you should:

Provide necessary background and motivate the problem.
Present the high level idea, approach, and/or insight (using examples, whenever appropriate) in the required reading as well as the additional reading.
Discuss technical details so that one can understand key details without carefully reading.
Explain the differences between related works.
Identify strengths and weaknesses of the required reading and propose directions of future research.

The slides for a presentation must be emailed to the instructor team at least 24 hours prior to the corresponding class. Use Google slides to enable in-line comments and suggestions.

Lecture Summaries

Each group will also be assigned to write summaries for at least one lectures. The summary assigned to a group will not be the reading they gave the lecture on.

A paper summary must address the following four questions in sufficient details (2-3 pages):

What is the problem addressed in the lecture, and why is this problem important?
What is the state of related works in this topic?
What is the proposed solution, and what key insight guides their solution?
What is one (or more) drawback or limitation of the proposal?
What are potential directions for future research?

The paper summary of a paper must be emailed to the instructor team within 24 hours after its presentation. Late reviews will not be counted. You should use this format for writing your summary. Use Google doc to enable in-line comments and suggestions.

Allocate enough time for your reading, discuss as a group, write the summary carefully, and finally, include key observations from the class discussion.

Post-Presentation Panel Discussion

To foster a deeper understanding of the papers and encourage critical thinking, each lecture will be followed by a panel discussion. This discussion will involve three distinct roles played by different student groups, simulating an interactive and dynamic scholarly exchange.

Roles and Responsibilities

The Authors

Group Assignment: The group that presents the paper and the group that writes the summary will play the role of the paper's authors.
Responsibility: As authors, you are expected to defend your paper against critiques, answer questions, and discuss how you might improve or extend your research in the future, akin to writing a rebuttal during the peer-review process.

The Reviewers

Group Assignment: Each group will be assigned to one slot to play the role of reviewers.
Responsibility: Reviewers critically assess the paper, posing challenging questions and highlighting potential weaknesses or areas for further investigation. Your goal is to engage in a constructive critique of the paper, simulating a peer review scenario.

Rest of the Class

Responsibility:
- You are required to submit one insightful question for each presented paper before each class.
- During the panel discussions, feel free to actively ask questions and engage in the dialogue.

Participation

Given the discussion-based nature of this course, participation is required both for your own understanding and to improve the overall quality of the course. You are expected to attend all lectures (you may skip up to 2 lectures due to legitimate reasons), and more importantly, participate in class discussions.

A key part of participation will be in the form of discussion in Ed. The group in charge of the summary should initiate the discussion and the rest should participate. Not everyone must have add something every day, but it is expected that everyone has something to say over the semester.

Project

You will have to complete substantive work an instructor-approved problem and have original contribution. Surveys are not permitted as projects; instead, each project must contain a survey of background and related work.

You must meet the following milestones (unless otherwise specified in future announcements) to ensure a high-quality project at the end of the semester:

Form a group of 3-4 members and declare your group's membership and paper preferences by September 4. After this date, we will form groups from the remaining students.
Turn in a 2-page draft proposal (including references) by September 18. Remember to include the names and Michigan email addresses of the group members.
Each group must present mid-semester progress during class hours on October 21 and October 23.
Each group must turn in an 8-page final report and your code via email on or before 1:00PM EST on December 15. The report must be submitted as a PDF file, with formatting similar to that of the papers you've read in the class. It should point to a git repository with all the code along with a README file with a step-by-step guide on how to compile and run the code.
You can find how to access GPU resources here.

Tentative Grading

	Weight
Paper Presentation	15%
Paper Summary	15%
Participation	10%
Project Report	40%
Project Presentations	20%

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Resources		Resources
Summaries		Summaries
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSE 585: Advanced Scalable Systems for Generative AI (F'25)

Administrivia

Team

Communication

Course Description

Prerequisites

Textbook

Tentative Schedule and Reading List

Policies

Honor Code

Groups

Required Reading

Student Lectures

Lecture Summaries

Post-Presentation Panel Discussion

Roles and Responsibilities

Participation

Project

Tentative Grading

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

mosharaf/cse585

Folders and files

Latest commit

History

Repository files navigation

CSE 585: Advanced Scalable Systems for Generative AI (F'25)

Administrivia

Team

Communication

Course Description

Prerequisites

Textbook

Tentative Schedule and Reading List

Policies

Honor Code

Groups

Required Reading

Student Lectures

Lecture Summaries

Post-Presentation Panel Discussion

Roles and Responsibilities

Participation

Project

Tentative Grading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages