Some Sample Results Without Fine Tuning
Figure: From left to right (Input Promt) — Dog and Person, Dog Person and Glass, Dog Person and Chair, Dog and Bag.
Figure: From left to right (Input Promt) — Car, Person.
- Base: GroundingDINO
- Fine Tuned on: KITTI Dataset
- Detection Loss: Hungarian matcher + classification + L1/IoU
- GPU: NVIDIA A100
The code is build upon ECCV 2024 Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"