👋 Welcome! ZO2 is an innovative framework specifically designed to enhance the fine-tuning of large language models (LLMs) using zeroth-order (ZO) optimization techniques and advanced offloading technologies. This framework is particularly tailored for setups with limited GPU memory (e.g. fine-tune OPT-175B with just 18GB GPU memory), enabling the fine-tuning of models that were previously unmanageable due to hardware constraints.
- The table below displays the GPU memory usage for various OPT model sizes when fine-tuned using the ZO2 framework:
OPT Models | 1.3B | 2.7B | 6.7B | 13B | 30B | 66B | 175B |
---|---|---|---|---|---|---|---|
GPU memory (GB) | 3.75 |
4.14 |
4.99 |
6.18 |
8.86 |
12.07 |
18.04 |
- Install the package and execute the following test to see the memory usage:
bash test/mezo_sgd/hf_opt/record_zo2_memory.sh
- 06/03/2025: We have open-sourced ZO2!
- Optimized ZO CPU Offloading: ZO2 leverages
zeroth-order (ZO)
methods to efficiently useCPU offloading
, avoiding redundant data transfers and significantly reducing GPU memory demands. This allows for handling large-scale models on hardware with limited GPU resources. - Dynamic Scheduling: Incorporates a high-performance scheduler to optimize the
computation-communication overlap
, enhancing GPU utilization and preventing training delays. - Capability for Very Large Models: Enables the fine-tuning of extraordinarily large models, such as those with over
175 billion parameters
, on single GPUs with as little as18GB
of memory, previously impossible with traditional methods. - Empirical Validation: ZO2 has demonstrated through rigorous testing that it can efficiently fine-tune massive models
without extra time costs or accuracy losses
, confirming its effectiveness for large-scale model training.
git clone https://github.com/liangyuwang/zo2.git
cd zo2/
conda env create -f env.yaml
conda activate zo2
We utilize the OPT models and MeZO-SGD as examples. For additional information, please refer to the section on Supported Models and ZO methods.
1. Using MeZO-Runner to Evaluate Fine-tuning Tasks
cd example/mezo_runner/
export CUDA_VISIBLE_DEVICES=0
MODEL=facebook/opt-2.7b TASK=SST2 MODE=ft LR=1e-7 EPS=1e-3 STEPS=20000 EVAL_STEPS=4000 bash mezo.sh
2. Supervised Fine-Tuning HF Models with ZOTrainer / ZOSFTTrainer [Trainer]
from zo2 import ZOConfig, zo_hf_init
from zo2.hf_trl import ZOTrainer, ZOSFTTrainer
from transformers import TrainingArguments
# Model and optimizer init
zo_config = ZOConfig(method="mezo-sgd", zo2=True, offloading_device='cpu', working_device='cuda', lr=1e-5)
with zo_hf_init(zo_config):
from transformers import OPTForCausalLM
model = OPTForCausalLM.from_pretrained("facebook/opt-125m")
model.zo_init(zo_config)
training_args = TrainingArguments("test-trainer")
trainer = ZOSFTTrainer( # or ZOTrainer
model,
args = training_args,
train_dataset=..., # get training dataset
eval_dataset=..., # get eval dataset
data_collator=..., # get data_collator
tokenizer=..., # use suitable tokenizer
compute_metrics=..., # define compute_metrics func,
...
)
trainer.train()
3. Train HF Models with Custom Training Loop [demo]
from zo2 import ZOConfig, zo_hf_init
# Model and optimizer init
zo_config = ZOConfig(method="mezo-sgd", zo2=True, offloading_device='cpu', working_device='cuda', lr=1e-5)
with zo_hf_init(zo_config):
from transformers import OPTForCausalLM
model = OPTForCausalLM.from_pretrained("facebook/opt-125m")
model.zo_init(zo_config)
# Training loop
for i in range(max_training_step):
# Train
training_input_ids, training_labels = ... # get training data batch
model.zo_train()
loss = model(input_ids=training_input_ids, labels=training_labels)
# Evaluate
eval_input_ids, eval_labels = ... # get eval data batch
model.zo_eval()
output = model(input_ids=eval_input_ids, labels=eval_labels)
# Final training update
model.opt.zo_update(model)
Please refer to tutorial.
-
Models:
- NanoGPT (mainly for idea evaluation)
- Transformers: OPT
-
ZO methods:
-
Tasks: Please refer to MeZO-Runner
Please refer to test.
- Support more models like LLaMA
- Support more ZO methods
- Support more offloading strategies (Disk offloading)
Feel free to submit issues and pull requests to improve the project!
- Liangyu Wang: [email protected]