Project 10- Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem #29877
PraroopChanda
started this conversation in
Google Summer of Code
Replies: 1 comment 2 replies
-
I sent you my initial proposal draft on your email id, Thank You |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi @rajeshgangireddy @adrianboguszewski @mlukasze
Hope you're doing well!
I am Praroop, currently a masters student at Texas A&M , focused on computer vision, multi-model learning and Generative AI.
I am highly interested in the project 10 - (Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem)
I did some preliminary research and settled down on GroundDINO, I set up the code base and ran a small fine tune on KITTI dataset, training only the decoder layer.
I used NVIDIA A100 GPU and keeping the batch size small to 6, training was using 9~10 GB of VRAM.
You can find the GitHub repo with the setup and initial detection results here: - https://github.com/PraroopChanda/GroundDINO_FineTune
Further I am planning to: -
Would really love to know your thoughts on this.
Attaching a couple of preliminary visual results below:
Best,
Praroop Chanda
https://praroopchanda.github.io/
Beta Was this translation helpful? Give feedback.
All reactions