Skip to content

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Notifications You must be signed in to change notification settings

Code-kunkun/LamRA

Repository files navigation

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

This repository is the official implementation of LamRA.

🏡 Project Page | 📄 Paper | 🤗 LamRA-Ret-Pretrained | 🤗 LamRA-Ret | 🤗 LamRA-Rank | 🤗 Dataset

Installation

conda create -n lamra python=3.10 -y
conda activate lamra 

pip install --upgrade pip  # enable PEP 660 support 
pip install -r requirements.txt

pip install ninja
pip install flash-attn --no-build-isolation

New Version

We have updated the version of Qwen2.5-VL in the qwen2.5vl branch.

Quickstart

Please refer to the demo.py

Data Preparation

Download Qwen2-VL-7B and place it in ./checkpoints/hf_models/Qwen2-VL-7B-Instruct

For pre-training dataset, please refer to link

For multimodal instruction tuning datset, please refer to M-BEIR

For evaluation data related to the LamRA, please refer to LamRA_Eval

After downloading all of them, organize the data as follows in ./data

├── M-BEIR
├── nli_for_simcse.csv
├── rerank_data_for_training
├── flickr
├── coco
├── sharegpt4v
├── Urban1K
├── circo
├── genecis
├── vist
├── visdial
├── ccneg
├── sugar-crepe
├── MSVD
└── msrvtt

Training & Evaluation for LamRA-Ret

Pre-training

sh scripts/lamra_ret/pretrain.sh
# Evaluation 
sh scripts/eval/eval_pretrained.sh
# Merge LoRA for multimodal instruction tuning stage
sh scripts/merge_lora.sh 

Multimodal instruction tuning

sh scripts/lamra_ret/finetune.sh
# Evaluation 
sh scripts/eval/eval_mbeir.sh   # eval under local pool setting

sh scripts/eval/eval_mbeir_global.sh   # eval under global pool setting

Training & Evaluation for LamRA-Rank

You can use the data we provide or run the following command to get the data for reranking training.

# Collecting data for reranking training
sh scripts/lamra_rank/get_train_data.sh

sh scripts/lamra_rank/merge_train_data.sh
# training for reranking
sh scripts/lamra_rank/train_rerank.sh
# pointwise reranking
sh scripts/eval/eval_rerank_mbeir_pointwise.sh

# listwise reranking
sh scripts/eval/eval_rerank_mbeir_listwise.sh
# Get the reranking results on M-BEIR
sh scirpts/eval/get_rerank_results_mbeir.sh

Evaluation on other benchmarks

# evaluation results on zeroshot datasets
sh scirpts/eval/eval_zeroshot.sh

# reranking the results on zeroshot datasets
sh scripts/eval/eval_rerank_zeroshot.sh

# get the final results
sh scripts/eval/get_rerank_results_zeroshot.sh

🫡 Acknowledgements

Many thanks to the code bases from lmms-finetune and E5-V.

Citation

If you use this code for your research or project, please cite:

@article{liu2024lamra,
  title={LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant},
  author={Yikun Liu and Pingan Chen and Jiayin Cai and Xiaolong Jiang and Yao Hu and Jiangchao Yao and Yanfeng Wang and Weidi Xie},
  journal={arXiv preprint arXiv:2412.01720},
  year={2024}
}

About

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published