Skip to content

Conversation

@hp6318
Copy link

@hp6318 hp6318 commented Sep 3, 2025

Object Detection on Images/Videos with DINOv3

This notebook demonstrates using DINOv3 for object detection using a frozen DINOv3 Backbone and
a lightweight DETR style decoder trained on COCO dataset (as mentioned in DINOv3 paper). We will use
the pre-trained weights provided by the authors.

Given:

  • RGB video frame(s)

We will extract bounding boxes with their scores and labels from each frame.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 3, 2025
@lartpang
Copy link

Thank you for your contribution. It's truly very valuable.
It would be even better if the official team could provide inference and evaluation scripts for all tasks covered in the paper, such as segmentation, classification, detection, depth estimation, and so on.

@foxyglue
Copy link

This is great work, thank you so much for sharing this object detection notebook!

I'm looking to fine-tune this on a custom dataset and had a quick question. My dataset is in YOLO format. Would you recommend converting it to COCO format to work with this model, or is there a more direct approach you might suggest?

Thanks again for the excellent contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants