Skip to content

kaustpradalab/draft

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory

👋 Welcome! ZO2 is an innovative framework specifically designed to enhance the fine-tuning of large language models (LLMs) using zeroth-order (ZO) optimization techniques and advanced offloading technologies. This framework is particularly tailored for setups with limited GPU memory (e.g. fine-tune OPT-175B with just 18GB GPU memory), enabling the fine-tuning of models that were previously unmanageable due to hardware constraints.

  • The table below displays the GPU memory usage for various OPT model sizes when fine-tuned using the ZO2 framework:
OPT Models 1.3B 2.7B 6.7B 13B 30B 66B 175B
GPU memory (GB) 3.75 4.14 4.99 6.18 8.86 12.07 18.04
  • Install the package and execute the following test to see the memory usage:
bash test/mezo_sgd/hf_opt/record_zo2_memory.sh

📰 News

  • 06/03/2025: We have open-sourced ZO2!

💡 Key Features

  • Optimized ZO CPU Offloading: ZO2 leverages zeroth-order (ZO) methods to efficiently use CPU offloading, avoiding redundant data transfers and significantly reducing GPU memory demands. This allows for handling large-scale models on hardware with limited GPU resources.
  • Dynamic Scheduling: Incorporates a high-performance scheduler to optimize the computation-communication overlap, enhancing GPU utilization and preventing training delays.
  • Capability for Very Large Models: Enables the fine-tuning of extraordinarily large models, such as those with over 175 billion parameters, on single GPUs with as little as 18GB of memory, previously impossible with traditional methods.
  • Empirical Validation: ZO2 has demonstrated through rigorous testing that it can efficiently fine-tune massive models without extra time costs or accuracy losses, confirming its effectiveness for large-scale model training.

⚙️ Installation

git clone https://github.com/liangyuwang/zo2.git
cd zo2/
conda env create -f env.yaml
conda activate zo2

🛠️ Usage

We utilize the OPT models and MeZO-SGD as examples. For additional information, please refer to the section on Supported Models and ZO methods.

1. Using MeZO-Runner to Evaluate Fine-tuning Tasks

cd example/mezo_runner/
export CUDA_VISIBLE_DEVICES=0
MODEL=facebook/opt-2.7b TASK=SST2 MODE=ft LR=1e-7 EPS=1e-3 STEPS=20000 EVAL_STEPS=4000 bash mezo.sh

2. Supervised Fine-Tuning HF Models with ZOTrainer / ZOSFTTrainer [Trainer]

from zo2 import ZOConfig, zo_hf_init
from zo2.hf_trl import ZOTrainer, ZOSFTTrainer
from transformers import TrainingArguments

# Model and optimizer init
zo_config = ZOConfig(method="mezo-sgd", zo2=True, offloading_device='cpu', working_device='cuda', lr=1e-5)
with zo_hf_init(zo_config):
    from transformers import OPTForCausalLM
    model = OPTForCausalLM.from_pretrained("facebook/opt-125m")
    model.zo_init(zo_config)

training_args = TrainingArguments("test-trainer")

trainer = ZOSFTTrainer(  # or ZOTrainer
    model,
    args = training_args,
    train_dataset=...,   # get training dataset
    eval_dataset=...,    # get eval dataset
    data_collator=...,   # get data_collator
    tokenizer=...,       # use suitable tokenizer
    compute_metrics=..., # define compute_metrics func,
    ...
)

trainer.train()

3. Train HF Models with Custom Training Loop [demo]

from zo2 import ZOConfig, zo_hf_init

# Model and optimizer init
zo_config = ZOConfig(method="mezo-sgd", zo2=True, offloading_device='cpu', working_device='cuda', lr=1e-5)
with zo_hf_init(zo_config):
    from transformers import OPTForCausalLM
    model = OPTForCausalLM.from_pretrained("facebook/opt-125m")
    model.zo_init(zo_config)

# Training loop
for i in range(max_training_step):
    # Train
    training_input_ids, training_labels = ...   # get training data batch
    model.zo_train()
    loss = model(input_ids=training_input_ids, labels=training_labels)
    # Evaluate
    eval_input_ids, eval_labels = ...   # get eval data batch
    model.zo_eval()     
    output = model(input_ids=eval_input_ids, labels=eval_labels)

# Final training update
model.opt.zo_update(model)

✨ Tutorial

Please refer to tutorial.

🤖 Supported Models, ZO methods, and Tasks

🧪 Test

Please refer to test.

🧭 Roadmap

  • Support more models like LLaMA
  • Support more ZO methods
  • Support more offloading strategies (Disk offloading)

🚶 Contributing

Feel free to submit issues and pull requests to improve the project!

📲 Contact

About

Privately Fine-Tuning Extremely Large Language Models with Zeroth-Order Offloading

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.4%
  • Jupyter Notebook 9.5%
  • Shell 6.1%