Skip to content

rishik18/CLIP_embedding_based_retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation


Introduction

• This project implements scalable Ray-based enrichment of datasets with CLIP embeddings to enable retrieval with image and text queries, cluster analysis,and deduplication.

• The notebook provides code to read in images, generate embeddings, and publish the dataset via Hugging Face.


Sample search results

sample_! sample_2 sample_3

Dataset

Source

All images were sourced from Francesco/insects-mytwu.

Link

https://huggingface.co/datasets/hkanade/insect_image_retrieval/

Sample usage

from datasets import load_dataset

ds_new = load_dataset("hkanade/insect_image_retrieval")
ds_new["train"][0]["image"]

About

Python Jupyter notebook for adding CLIP embeddings to datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published