Skip to content

m2b3/ActiveVision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

ActiveVision

Title: ActiveVision: a data and model portal for the study of goal-directed vision

Project Description: Salience map research in computer vision has extensively examined where human observers look in images and videos during free viewing. Despite cognitive psychology recognizing the role of behavioral goals for over 50 years, integrating task dependence into quantitative models and large open datasets is a recent development. This project aims to create an open portal that consolidates existing machine learning/AI models and eye-tracking datasets related to goal-directed vision (e.g., visual search) while providing tools for model testing and validation. A key focus will be on multimodal AI, particularly language-vision integration. Additionally, this platform will serve as a prototype for similar data+model initiatives on public hardware platforms.

This is a new project, that this GSoC contributor will start from scratch, with help and mentorship from us. We have had good success in the past with such an approach, with successful projects going on to second and third years for additional development, and contributors from one year joining in as mentors for the following year. The goal of this project is therefore fairly open-ended. We are looking at existing models of goal-directed vision and free-viewing, as well as benchmarks - some examples include IRL, TCT, HAT, VISIONS, V* Ctrl-O, GazeFormer and IVSN. Improving these models or developing new/other models is also possible. Additional ideas are welcome. A good proposal will answer the questions of what do you propose to do, why, how will you do it, why should you be the person doing it, and is it feasible that you can do it. Additional questions are welcome via Neurostars.

Skill level: Advanced

Project website: https://github.com/m2b3/ActiveVision

Project size: 350 hours (Large)

Pre-requisites: Familiarity with open-source vision and multimodal AI models. Fluency in Python and PyTorch. Familiarity with Slurm and working with clusters preferred. Basic web-development skills or interest in learning them will be useful.

Tech keywords: Python, PyTorch, Visual search, Saliency, Science portals, Vision AI, Vision-language models.

No planned longer absences

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published