Project 10- Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem #29877

PraroopChanda · 2025-04-01T22:39:20Z

PraroopChanda
Apr 1, 2025

Hi @rajeshgangireddy @adrianboguszewski @mlukasze

Hope you're doing well!

I am Praroop, currently a masters student at Texas A&M , focused on computer vision, multi-model learning and Generative AI.
I am highly interested in the project 10 - (Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem)

I did some preliminary research and settled down on GroundDINO, I set up the code base and ran a small fine tune on KITTI dataset, training only the decoder layer.

I used NVIDIA A100 GPU and keeping the batch size small to 6, training was using 9~10 GB of VRAM.
You can find the GitHub repo with the setup and initial detection results here: - https://github.com/PraroopChanda/GroundDINO_FineTune

Further I am planning to: -

Integrate LORA, QLORA for fine tuning,
Try fine tuning on other datasets and
Explore other VLM models such as OWL-VIT, DETR for fine tuning and classification.
Investigate the use of hyperbolic embeddings for hierarchical classification

Would really love to know your thoughts on this.
Attaching a couple of preliminary visual results below:

Best,
Praroop Chanda
https://praroopchanda.github.io/

PraroopChanda · 2025-04-03T08:07:05Z

PraroopChanda
Apr 3, 2025
Author

Hi @rajeshgangireddy,

I sent you my initial proposal draft on your email id,
If possible, Please have a look, would really love to know your thoughts and feedback.

Thank You
Praroop Chanda

2 replies

rajeshgangireddy Apr 3, 2025

Hi @PraroopChanda ,
I will review it.

Thanks!

rajeshgangireddy Apr 7, 2025

Hi Praroop,
It great to see that you already have setup a repo and were able to fine-tune a VLM for object detection.
I have replied to your email with a few comments on your draft proposal.

Let us know if you have any further questions on the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project 10- Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem #29877

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Project 10- Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem #29877

PraroopChanda Apr 1, 2025

Replies: 1 comment · 2 replies

PraroopChanda Apr 3, 2025 Author

rajeshgangireddy Apr 3, 2025

rajeshgangireddy Apr 7, 2025

PraroopChanda
Apr 1, 2025

Replies: 1 comment 2 replies

PraroopChanda
Apr 3, 2025
Author