This repository contains the code and resources for benchmarking VLMs for scene understanding of challenging social navigation scenarios. For more information, please see the paper (will add link later).
-
Install Dependencies
pip install -r requirements.txt
-
Download the Dataset
Please download our dataset from HuggingFace by running the
download_dataset.shscript:./download_dataset.sh
-
Benchmark a VLM
Make a config file and specify the VLM under the
baseline_modelparameter and parameters for the experiments (such as prompt representation). API models require an environment variable containing an API key (GOOGLE_API_KEYorOPENAI_API_KEY).python socialnavsub/evaluate_vlm.py --cfg_path <cfg_path>
-
View Results
Results will be saved in the directory specified in the config file under the
evaluation_folderentry. To postprocess the results, please run:python socialnavsub/postprocess_results.py --cfg_path <cfg_path>
The results will be viewable in the csv whose filepath is specified in the
postprocessed_results_csventry in the config file (by default,postprocessed_results.csv).
Contributions are welcome! Please open issues or pull requests for bug fixes, new features, or improvements.
You can make a new class file that contains a subclass of APIBaseline (api_baseline.py). For examples, please see gemini.py, llava.py, or gpt4o.py.
For questions or support, please open an issue or email [email protected].