Morph

This repository contains the code for paper Scaling Inference-Efficient Language Models (ICML'25). Our code is based on OpenLM and DCLM-Morph.

Quickstart

Here we'll go over a basic example where we start from a fresh install, download and preprocess some training data, and train a model.

Process Training Data

We use the DCLM dataset to train the models. The detailed data preprocessing steps are provided in DCLM-Morph.

Setup

conda create -n llm python=3.9
cd open-lm-shape
pip install -r requirements.txt
pip install --editable .

Run Training

Model choices are contained in the following table, where, for instance 80m indicates an 80 million parameter model and 1b indicates a 1 billion parameter model.

Since we study models with different shapes, the same number of parameters can result in different configurations. For example, scale_open_lm_80m_v1 differs from scale_open_lm_80m_v2. The specific configurations are provided in the paper.

Model Name
`scale_open_lm_80m_v{1,2,..,7}`
`scale_open_lm_116m_v{1,2,..,6}`
`scale_open_lm_164m_v{1,2,..,8}`
`scale_open_lm_237m_v{1,2,..,6}`
`scale_open_lm_313m_v{1,2,..,6}`
`scale_open_lm_1b_v{1,2,..,8}`

An example training run can be called as follows:

export CUDA_VISIBLE_DEVICES=0,1
torchrun --nproc-per-node=2 --master_port=8100 -m open_lm.main \
  --model scale_open_lm_116m_v6 \
  --dataset-manifest /mnt/data/dclm_output_192B/manifest.jsonl \
  --train-num-samples 2315255808 \
  --workers 1 \
  --precision amp_bfloat16 \
  --global-batch-size 128 \
  --accum-freq 16 \
  --grad-checkpointing \
  --log-every-n-steps 100 \
  --grad-clip-norm 1 \
  --data-key json.gz \
  --lr 3e-3 \
  --warmup 2000 \
  --wd 0.033 \
  --beta2 0.95 \
  --epochs 2 \
  --z-loss-coefficient 1e-4 \
  --name open_lm_ex_$RANDOM \
  --resume latest \
  --lr-cooldown-end 3e-5 \

Checkpoints and final model weights will be saved to the specified logs directory.

Evaluate Model

To set up the environment, we first follow these steps.

git clone https://github.com/Waterpine/dclm-morph.git
cd dclm-morph
apt install cmake build-essential
apt install g++-9
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90
pip install -r requirements.txt
conda create -n eval python=3.9
cd open-lm-morph
pip install --editable .

We present an example evaluating Morph-1B on the COPA dataset.

cd eval
export CUDA_VISIBLE_DEVICES=0

MODEL_NAME=scale_open_lm_1b_v8
CHECKPOINT_PATH=xxx/logs/checkpoints/morph_1b.pt
CATEGORY=commonsense_reasoning

python eval_openlm_ckpt.py \
--eval-yaml local_yaml_1b/copa.yaml \
--model $MODEL_NAME \
--checkpoint $CHECKPOINT_PATH \
--category $CATEGORY

Experimental Results

We evaluate 1B models over the following datasets: arc_challenge, arc_easy, boolq, copa, hellaswag, jeopardy, lambada_openai, MMLU, piqa, winograd, and winogrande.

Models	d_model	n_layers	Average	Latency (s)

Open-LM-1B	2048	24	0.49	3.61
OPT-1.3B	2048	24	0.50	2.55
Pythia-1.3B	2048	22	0.49	3.28
Neox-1.3B	2048	24	0.49	3.99
OPT-IML-1.3B	2048	24	0.54	2.54
Morph-1B	3072	12	0.52	1.96

Contributing

Authors: Song Bian*, Minghao Yan*, Shivaram Venkataraman

Affiliated: University of Wisconsin-Madison

Citation

If you use this model in your work, please use the following BibTeX citation:

@article{bian2025scaling,
  title={Scaling Inference-Efficient Language Models},
  author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram},
  journal={arXiv preprint arXiv:2501.18107},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
eval		eval
inference		inference
law		law
law_nonequal		law_nonequal
open_lm		open_lm
plots		plots
sagemaker_train		sagemaker_train
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AVERAGE.md		AVERAGE.md
EMA_epoch_loss.py		EMA_epoch_loss.py
EMA_loss.py		EMA_loss.py
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MOE.md		MOE.md
Makefile		Makefile
README.md		README.md
environment-tests.yml		environment-tests.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_test.txt		requirements_test.txt
run.sh		run.sh
run_checkpoint.sh		run_checkpoint.sh
run_cloudlab_a100.sh		run_cloudlab_a100.sh
run_cloudlab_a100_1b.sh		run_cloudlab_a100_1b.sh
run_cloudlab_a100_checkpoint.sh		run_cloudlab_a100_checkpoint.sh
run_cloudlab_a100_test.sh		run_cloudlab_a100_test.sh
run_cloudlab_m1b.sh		run_cloudlab_m1b.sh
run_local_v100.sh		run_local_v100.sh
run_local_v100_checkpoint.sh		run_local_v100_checkpoint.sh
run_local_v100_manifest.sh		run_local_v100_manifest.sh
run_local_v100_manifest_checkpoint.sh		run_local_v100_manifest_checkpoint.sh
setup.py		setup.py
slurm_run.sh		slurm_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Morph

Contents

Quickstart

Process Training Data

Setup

Run Training

Evaluate Model

Experimental Results

Contributing

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Waterpine/open-lm-morph

Folders and files

Latest commit

History

Repository files navigation

Morph

Contents

Quickstart

Process Training Data

Setup

Run Training

Evaluate Model

Experimental Results

Contributing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages