Skip to content

JAMESYJL/ShapeLLM-Omni

Repository files navigation

ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding

Junliang Ye1,2*, Zhengyi Wang1,2*, Ruowen Zhao1*, Shenghao Xie3, Jun Zhu1,2†
*Equal Contribution.
Corresponding authors.
1Tsinghua University, 2ShengShu, 3Peking University,

                   

demo.mp4

Release

  • [6/03] 🔥🔥We released the pretrained weights for both ShapeLLM-Omni (7B) and 3DVQVAE.
  • [6/03] 🔥🔥We released 50k high-quality 3D edited data pairs.
  • [6/07] 🔥🔥We built a demo for everyone to try out.

Installation

Please set up the Python environment following TRELLIS and QWEN2.5-vl, or you can create by:

pip install -r requirements.txt

Inference

We suggest using Gradio UI for visualizing inference.

python app.py
open_video5.mp4

For templates used for different tasks, please refer to the templates.txt

Qualitative result

text.mp4
image2.mp4

Important Notes

Todo

  • Release of the entire 3D-Alpaca dataset.
  • Release of training code.
  • Release of model weights featuring multi-turn dialogue and 3D editing capabilities.

Acknowledgement

Our code is based on these wonderful repos:

✍️ Citation

@article{ye2025shapellm,
  title={ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding},
  author={Ye, Junliang and Wang, Zhengyi and Zhao, Ruowen and Xie, Shenghao and Zhu, Jun},
  journal={arXiv preprint arXiv:2506.01853},
  year={2025}
}

About

A Native Multimodal LLM for 3D Generation and Understanding

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors 3

  •  
  •  
  •  

Languages