This is the official implementation for our ECCV 2024 paper.
News and upcoming updates
- We released the source code and diverse text prompt sets!
TL;DR: In this paper, we propose an OpenKD model with several intriguing features: 1) supporting both visual and textual prompts, 2) having the potential to handle unseen texts and diverse texts, and 3) maintaining strong generality and performance on zero-shot keypoint detection (ZSKD) and few-shot keypoint detection (FSKD). We report that LLM is capable of being a reasoner for text interpolation, and a good language parser for parsing diverse texts. We also contribute four diverse text prompt sets for the popular Animal pose dataset, AwA, CUB, and NABird for fair evaluations. To our best knowledge, we are the first to open semantics and language diversity of text prompts for keypoint detection.
We provided the major involved python packages in requirements.txt. The codes rely on minimal external packages and should be easy to run after installing the
- pytorch
- opencv, yacs, etc.
-
Animal pose dataset
For easy bootstrapping your own experiments and reproducing our results, we have uploaded the entire Animal pose dataset here. After downloading the Animal pose dataset, the folders involved in our OpenKD source codes are
|--Animal_Dataset_Combined |--gt_coco |--images |--saliency_maps |--readme-coco.txtgt_cocofolder gives the keypoint annotations in COCO format;saliency_mapsfolder provides the saliency maps corresponding to the RGB images in folderimages. Note that all those saliency maps were extracted via the off-the-shelf saliency detector SCRN. The saliency maps are used to prune invalid auxiliary keypoints generated by visual interpolation.Please modify the paths of IMAGE_ROOT, JSON_ROOT, SALIENCY_MAPS_ROOT under DATASET.ANIMAL_POSE in config file openkd.yaml, using your local paths of
images,gt_coco, andsaliency_maps -
(Optional) AwA pose dataset
AwA pose is another animal pose dataset, whose images can be downloaded here. We provided the curated COCO annotations for AwA pose here. The saliency maps can be extracted via the off-the-shelf saliency detector SCRN.
Please modify the paths of DATASET.AWA in openkd.yaml accordingly.
-
(Optional) CUB and NABird datasets
Please download these well-known bird datasets online, respectively. Similarly, you can extract their saliency maps as Animal pose and AwA pose datasets. We also provided their curated COCO annotations here.
Remember to modify the corresponding paths in config file openkd.yaml accordingly.
-
Download the pre-trained weights of CLIP, and place them into your local folder, e.g.,
pretrained_models/clip_weights. Afterwards, please modify the path of CLIP.WEIGHTS_ROOT in config file openkd.yaml -
Train the OpenKD model with the novel text interpolation
python3 train_openkd.py --cfg_file experiments/configs/openkd.yaml
-
Train the OpenKD model without text interpolation
python3 train_openkd.py --cfg_file experiments/configs/openkd.yaml \ DATASET.GENERATE_INTERPOLATED_TEXTS False
You can modify the TEST.NUM_TEST_SHOT and TEST.TEXT_PROMPT_SETTING.NUM_TEXT to perform 0-shot, 1-shot, or 1-shot with text testing as follows:
-
0-shot testing
python3 eval_openkd.py --cfg_file experiments/configs/openkd.yaml
-
1-shot testing
python3 eval_openkd.py --cfg_file experiments/configs/openkd.yaml \ TEST.NUM_TEST_SHOT 1 \ TEST.TEXT_PROMPT_SETTING.NUM_TEXT 0
-
1-shot with text testing
python3 eval_openkd.py --cfg_file experiments/configs/openkd.yaml \ TEST.NUM_TEST_SHOT 1 \ TEST.TEXT_PROMPT_SETTING.NUM_TEXT 1
We provide the 1000 diverse text prompts for Animal pose dataset, AwA pose, CUB, and NABird datasets in datasets/text_prompts, to support the research of diverse text prompting. The prompts are in json files.
As demonstrated in our paper, LLM is a good parser to extract keypoint texts from diverse texts, thus we provided two types of parsed results, which were parsed by GPT3.5 and Vicuna, respectively. Either kind of json files stored 1000 identical diverse text prompts, but with different parsed results.
The interesting diverse text prompting can be done via
python3 eval_diverse_prompts_noparsing.py --cfg_file experiments/configs/openkd.yamlIf you find this code useful for your research, please cite our work.
@InProceedings{openKD_eccv24,
author={Changsheng Lu and Zheyuan Liu and Piotr Koniusz},
title={OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024},
}- Raise a new GitHub issue or contact me at [email protected]
