Skip to content

Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models'

Notifications You must be signed in to change notification settings

neu-vi/struct2d

Repository files navigation

Struct2D

Official code release for Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models.

Fangrui Zhu*, Hanhui Wang*, Yiming Xie, Jing Gu, Tianye Ding, Jianwei Yang, Huaizu Jiang
*Equal Contribution

📑 Paper (arXiv)   Hugging Face Dataset and Models

Highlights

  • We propose a perception-guided 2D prompting strategy, Struct2D Prompting, and conduct a detailed zero-shot analysis that reveals LMMs’ ability to perform 3D spatial reasoning from structured 2D inputs alone.
  • We introduce Struct2D-Set, a large-scale instructional tuning dataset with automatically generated, fine-grained QA pairs covering eight spatial reasoning categories grounded in 3D scenes.
  • We fine-tune an open-source LMM to achieve competitive performance across several spatial reasoning benchmarks, validating the real-world applicability of our framework.

📁 Contents

  1. Zero-shot Analysis
  2. Data Processing
  3. Training and Evaluation

Installation

conda create -n struct2d python=3.10 -y
conda activate struct2d
git clone [email protected]:neu-vi/struct2d.git
pip install -e ".[torch,metrics]" --no-build-isolation

📖 Citation

If you find Struct2D helpful in your research, please consider citing:

@article{zhu2025struct2d,
  title={Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models},
  author={Zhu, Fangrui and Wang, Hanhui and Xie, Yiming and Gu, Jing and Ding, Tianye and Yang, Jianwei and Jiang, Huaizu},
  journal={arXiv preprint arXiv:2506.04220},
  year={2025}
}

🙏 Acknowledgement

We thank the authors of GPT4Scene, LLaMA-Factory for inspiring discussions and open-sourcing their codebases.

About

Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages