Skip to content

Opening prompt diversity for zero-shot keypoint detection, few-shot keypoint detection, or few-shot with text detection

Notifications You must be signed in to change notification settings

AlanLuSun/OpenKD

Repository files navigation

OpenKD: Opening Prompt Diversity for Zero-and Few-shot Keypoint Detection

This is the official implementation for our ECCV 2024 paper.

Changsheng Lu, Zheyuan Liu, Piotr Koniusz

cvf arXiv License: MIT

News and upcoming updates

  • We released the source code and diverse text prompt sets!

1. Introduction

TL;DR: In this paper, we propose an OpenKD model with several intriguing features: 1) supporting both visual and textual prompts, 2) having the potential to handle unseen texts and diverse texts, and 3) maintaining strong generality and performance on zero-shot keypoint detection (ZSKD) and few-shot keypoint detection (FSKD). We report that LLM is capable of being a reasoner for text interpolation, and a good language parser for parsing diverse texts. We also contribute four diverse text prompt sets for the popular Animal pose dataset, AwA, CUB, and NABird for fair evaluations. To our best knowledge, we are the first to open semantics and language diversity of text prompts for keypoint detection.

2. Requirements

We provided the major involved python packages in requirements.txt. The codes rely on minimal external packages and should be easy to run after installing the

  • pytorch
  • opencv, yacs, etc.

3. Dataset Preparation

  • Animal pose dataset

    For easy bootstrapping your own experiments and reproducing our results, we have uploaded the entire Animal pose dataset here. After downloading the Animal pose dataset, the folders involved in our OpenKD source codes are

    |--Animal_Dataset_Combined  
      |--gt_coco
      |--images
      |--saliency_maps
      |--readme-coco.txt
    

    gt_coco folder gives the keypoint annotations in COCO format; saliency_maps folder provides the saliency maps corresponding to the RGB images in folder images. Note that all those saliency maps were extracted via the off-the-shelf saliency detector SCRN. The saliency maps are used to prune invalid auxiliary keypoints generated by visual interpolation.

    Please modify the paths of IMAGE_ROOT, JSON_ROOT, SALIENCY_MAPS_ROOT under DATASET.ANIMAL_POSE in config file openkd.yaml, using your local paths of images, gt_coco, and saliency_maps

  • (Optional) AwA pose dataset

    AwA pose is another animal pose dataset, whose images can be downloaded here. We provided the curated COCO annotations for AwA pose here. The saliency maps can be extracted via the off-the-shelf saliency detector SCRN.

    Please modify the paths of DATASET.AWA in openkd.yaml accordingly.

  • (Optional) CUB and NABird datasets

    Please download these well-known bird datasets online, respectively. Similarly, you can extract their saliency maps as Animal pose and AwA pose datasets. We also provided their curated COCO annotations here.

    Remember to modify the corresponding paths in config file openkd.yaml accordingly.

4. Model Training

  • Download the pre-trained weights of CLIP, and place them into your local folder, e.g., pretrained_models/clip_weights. Afterwards, please modify the path of CLIP.WEIGHTS_ROOT in config file openkd.yaml

  • Train the OpenKD model with the novel text interpolation

    python3 train_openkd.py --cfg_file experiments/configs/openkd.yaml
  • Train the OpenKD model without text interpolation

    python3 train_openkd.py --cfg_file experiments/configs/openkd.yaml \
    DATASET.GENERATE_INTERPOLATED_TEXTS False

5. Zero-shot and few-shot testing

You can modify the TEST.NUM_TEST_SHOT and TEST.TEXT_PROMPT_SETTING.NUM_TEXT to perform 0-shot, 1-shot, or 1-shot with text testing as follows:

  • 0-shot testing

    python3 eval_openkd.py --cfg_file experiments/configs/openkd.yaml
  • 1-shot testing

    python3 eval_openkd.py --cfg_file experiments/configs/openkd.yaml \
    TEST.NUM_TEST_SHOT 1 \
    TEST.TEXT_PROMPT_SETTING.NUM_TEXT 0
  • 1-shot with text testing

    python3 eval_openkd.py --cfg_file experiments/configs/openkd.yaml \
    TEST.NUM_TEST_SHOT 1 \
    TEST.TEXT_PROMPT_SETTING.NUM_TEXT 1

6. 1000 diverse text prompts testing

We provide the 1000 diverse text prompts for Animal pose dataset, AwA pose, CUB, and NABird datasets in datasets/text_prompts, to support the research of diverse text prompting. The prompts are in json files.

As demonstrated in our paper, LLM is a good parser to extract keypoint texts from diverse texts, thus we provided two types of parsed results, which were parsed by GPT3.5 and Vicuna, respectively. Either kind of json files stored 1000 identical diverse text prompts, but with different parsed results.

The interesting diverse text prompting can be done via

python3 eval_diverse_prompts_noparsing.py --cfg_file experiments/configs/openkd.yaml

7. Citation

If you find this code useful for your research, please cite our work.

@InProceedings{openKD_eccv24,
  author={Changsheng Lu and Zheyuan Liu and Piotr Koniusz},
  title={OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024},
}

8. Contact

About

Opening prompt diversity for zero-shot keypoint detection, few-shot keypoint detection, or few-shot with text detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published