SimpleAR: Pushing the Frontier of Autoregressive Visual Generation

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL

Junke Wang¹, Zhi Tian², Xun Wang², Xinyu Zhang², Weilin Huang², Zuxuan Wu¹, Yu-Gang Jiang¹
¹Fudan University, ²ByteDance Seed

Introduction

This paper presents SimpleAR, a vanilla autoregressive visual generation model that achieves state-of-the-art text-to-image generation performance. First the first time, we demonstrate that:

🏆 with only 0.5B parameters, an AR model can generate 1024 resolution images with high fidelity, and achieve competitive results on challenging T2I benchmarks, e.g., 0.59 on GenEval and 79.66 on DPG;
🚀 both supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO) training could lead to significant improvements on image aesthectics and prompt alignment;
⚡️ when deployed with vLLM, the throughput of our model allows for generating 1024 resolution images in 14 seconds, making high-resolution generation practical for real-world applications.

We open-sourced all the training and inference code, hoping to show the potential of autoregressive visual generation and encourage more participation in this research field.

Updates

[2025/04/20] Installation instructions and model zoo are updated! Thanks syjmelody, wusize, and micky-li-hd for raising issues.
[2025/04/21] Stronger models with better generation quality, and more functionality, e.g., editing and controllable generation, will be released in this repo, please stay tuned!
[2025/04/22] We provide a demo code to play with our released models.

Installation

For basic usage (pretraining, SFT, inference without vLLM), you can install the dependencies with:

python3 -m venv env

source env/bin/activate

pip install -e ".[train]"

While for advanced usage, please refer to TRAIN.md (GRPO training) and EVAL.md (inference with vLLM) to setup the environments, respectively.

Models & Scripts

Model Zoo

We provide both SFT and RL checkpoints:

name	GenEval	DPG	HF weights 🤗
SimpleAR-0.5B-SFT	0.53	79.34	simplear-0.5B-sft
SimpleAR-0.5B-RL	0.59	79.66	simplear-0.5B-grpo
SimpleAR-1.5B-SFT	0.61	80.11	simplear-1.5B-sft
SimpleAR-1.5B-RL	0.63	81.31	simplear-1.5B-grpo

Cosmos is used as our visual tokenizer, you can download and put it under ./checkpoints/:

cd checkpoints

git lfs install

git clone https://huggingface.co/nvidia/Cosmos-1.0-Tokenizer-DV8x16x16

Play with Our Model (Quick Start)

You can directly load SimpleAR with from_pretrained now 🤗! We provide the demo code in PLAY.md.

Training

Please find the instructions on data preparation and training here.

Evaluation and Inference

We provide scripts to evaluate our released checkpoints on GenEval and DPG-Bench. Please see EVAL.md for more details.

Also, you can generate images with SimpleAR using generate.py. We implement different acceleration approaches, e.g., vLLM, speculative jacobi decoding. Please refer to EVAL.md.

Visualizations

1024 x 1024 generation results by SimpleAR.

Citation

If you find this repository helpful, please consider citing:

@article{wang2025simplear,
  title={SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL},
  author={Wang, Junke and Tian, Zhi and Wang, Xun and Zhang, Xinyu and Huang, Weilin and Wu, Zuxuan and Jiang, Yu-Gang},
  journal={arXiv preprint arXiv:2504.11455},
  year={2025}
}

Acknowledgement

We thank Peize Sun, Rui Tian, Feng Li, and Teng Yao for their valuable discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
docs		docs
hpsv2		hpsv2
scripts		scripts
simpar		simpar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation

Introduction

Updates

Installation

Models & Scripts

Model Zoo

Play with Our Model (Quick Start)

Training

Evaluation and Inference

Visualizations

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

wdrink/SimpleAR

Folders and files

Latest commit

History

Repository files navigation

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation

Introduction

Updates

Installation

Models & Scripts

Model Zoo

Play with Our Model (Quick Start)

Training

Evaluation and Inference

Visualizations

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages