PRISM: Principled Reasoning for Integrated Safety in Multimodality

This repository provides the implementation of PRISM, an alignment framework that integrates principled reasoning with safety through structured, multi-step reasoning.

🚀 Quick Start

Prerequisites

conda create -n PRISM python=3.10
conda activate PRISM
pip install 'ms-swift[all]' -U
pip install vllm

1) Model Training

📦 Datasets

We open-source the training datasets on Hugging Face:

PRISM-CoT: https://huggingface.co/datasets/andyc03/PRISM-CoT
PRISM-DPO: https://huggingface.co/datasets/andyc03/PRISM-DPO

First, prepare the data. We have released the PRISM-CoT and PRISM-DPO datasets. Convert your dataset to a Swift-compatible format by providing the absolute path to your data folder:

python utils/formatting.py --folder /your_path_here/PRISM_COT

Then add the special tokens for your model using utils/add_tokens.py:

python utils/add_tokens.py --model_path /your_mode_path_here

Now you can train your PRISM model. Update the JSON and model path in training_scripts/qwen2_vl.sh, for example:

cd training_scripts

# For Qwen2-VL with full-parameters SFT
bash qwen2_vl.sh

📦 Model Weights

We provide the model weights used in our experiments on Hugging Face:

Qwen2-VL-PRISM-SFT: https://huggingface.co/andyc03/Qwen2-VL-PRISM-SFT
Qwen2-VL-PRISM-DPO: https://huggingface.co/andyc03/Qwen2-VL-PRISM-DPO

2) MCTS Data Generation

If you want to generate preference data using Monte Carlo Tree Search (MCTS), we provide scripts to help you do so:

cd PRISM_DPO_data

First, change the model path of your downloaded PRISM-CoT model in scripts/activate_vllm.sh, then launch it:

bash scripts/activate_vllm.sh

Next, configure your model path and data in config/qwen_tree_generate.yaml, then run MCTS data generation:

# Then run MCTS data generation
bash scripts/generate_MCT.sh

Configuration parameters:

actor_model_dir: Path to your model
train_prompt_path: Input prompts for data generation
iterations: Number of MCTS iterations (default: 200)
c: UCB exploration parameter (default: 1.5)
max_depth: Maximum reasoning depth (default: 5)

3) Test-Time Scaling

Please refer to TTS/TTS.md for running details.

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

📚 Citation

If you use PRISM in your research, please consider citing our paper:

@misc{li2025prismrobustvlmalignment,
      title={PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality}, 
      author={Nanxi Li and Zhengyue Zhao and Chaowei Xiao},
      year={2025},
      eprint={2508.18649},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2508.18649}, 
}

🙏 Acknowledgments

Built on top of excellent open-source projects including ms-swift, vLLM, and STAIR.

For questions, issues, or discussions, please open an issue in this repository or contact the author at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
PRISM_DPO_data		PRISM_DPO_data
TTS		TTS
training_scripts		training_scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRISM: Principled Reasoning for Integrated Safety in Multimodality

🚀 Quick Start

Prerequisites

1) Model Training

📦 Datasets

📦 Model Weights

2) MCTS Data Generation

3) Test-Time Scaling

📄 License

📚 Citation

🙏 Acknowledgments

About

Uh oh!

Languages

License

SaFo-Lab/PRISM

Folders and files

Latest commit

History

Repository files navigation

PRISM: Principled Reasoning for Integrated Safety in Multimodality

🚀 Quick Start

Prerequisites

1) Model Training

📦 Datasets

📦 Model Weights

2) MCTS Data Generation

3) Test-Time Scaling

📄 License

📚 Citation

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages