This repository is the official implementation of LamRA.
🏡 Project Page | 📄 Paper | 🤗 LamRA-Ret-Pretrained | 🤗 LamRA-Ret | 🤗 LamRA-Rank | 🤗 Dataset
conda create -n lamra python=3.10 -y
conda activate lamra
pip install --upgrade pip # enable PEP 660 support
pip install -r requirements.txt
pip install ninja
pip install flash-attn --no-build-isolation
We have updated the version of Qwen2.5-VL in the qwen2.5vl
branch.
Please refer to the demo.py
Download Qwen2-VL-7B and place it in ./checkpoints/hf_models/Qwen2-VL-7B-Instruct
For pre-training dataset, please refer to link
For multimodal instruction tuning datset, please refer to M-BEIR
For evaluation data related to the LamRA, please refer to LamRA_Eval
After downloading all of them, organize the data as follows in ./data
├── M-BEIR
├── nli_for_simcse.csv
├── rerank_data_for_training
├── flickr
├── coco
├── sharegpt4v
├── Urban1K
├── circo
├── genecis
├── vist
├── visdial
├── ccneg
├── sugar-crepe
├── MSVD
└── msrvtt
sh scripts/lamra_ret/pretrain.sh
# Evaluation
sh scripts/eval/eval_pretrained.sh
# Merge LoRA for multimodal instruction tuning stage
sh scripts/merge_lora.sh
sh scripts/lamra_ret/finetune.sh
# Evaluation
sh scripts/eval/eval_mbeir.sh # eval under local pool setting
sh scripts/eval/eval_mbeir_global.sh # eval under global pool setting
You can use the data we provide or run the following command to get the data for reranking training.
# Collecting data for reranking training
sh scripts/lamra_rank/get_train_data.sh
sh scripts/lamra_rank/merge_train_data.sh
# training for reranking
sh scripts/lamra_rank/train_rerank.sh
# pointwise reranking
sh scripts/eval/eval_rerank_mbeir_pointwise.sh
# listwise reranking
sh scripts/eval/eval_rerank_mbeir_listwise.sh
# Get the reranking results on M-BEIR
sh scirpts/eval/get_rerank_results_mbeir.sh
# evaluation results on zeroshot datasets
sh scirpts/eval/eval_zeroshot.sh
# reranking the results on zeroshot datasets
sh scripts/eval/eval_rerank_zeroshot.sh
# get the final results
sh scripts/eval/get_rerank_results_zeroshot.sh
Many thanks to the code bases from lmms-finetune and E5-V.
If you use this code for your research or project, please cite:
@article{liu2024lamra,
title={LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant},
author={Yikun Liu and Pingan Chen and Jiayin Cai and Xiaolong Jiang and Yao Hu and Jiangchao Yao and Yanfeng Wang and Weidi Xie},
journal={arXiv preprint arXiv:2412.01720},
year={2024}
}