ActiveVision

Title: ActiveVision: a data and model portal for the study of goal-directed vision

Project Description: Salience map research in computer vision has extensively examined where human observers look in images and videos during free viewing. Despite cognitive psychology recognizing the role of behavioral goals for over 50 years, integrating task dependence into quantitative models and large open datasets is a recent development. This project aims to create an open portal that consolidates existing machine learning/AI models and eye-tracking datasets related to goal-directed vision (e.g., visual search) while providing tools for model testing and validation. A key focus will be on multimodal AI, particularly language-vision integration. Additionally, this platform will serve as a prototype for similar data+model initiatives on public hardware platforms.

This is a new project, that this GSoC contributor will start from scratch, with help and mentorship from us. We have had good success in the past with such an approach, with successful projects going on to second and third years for additional development, and contributors from one year joining in as mentors for the following year. The goal of this project is therefore fairly open-ended. We are looking at existing models of goal-directed vision and free-viewing, as well as benchmarks - some examples include IRL, TCT, HAT, VISIONS, V* Ctrl-O, GazeFormer and IVSN. Improving these models or developing new/other models is also possible. Additional ideas are welcome. A good proposal will answer the questions of what do you propose to do, why, how will you do it, why should you be the person doing it, and is it feasible that you can do it. Additional questions are welcome via Neurostars.

Skill level: Advanced

Project website: https://github.com/m2b3/ActiveVision

Project size: 350 hours (Large)

Pre-requisites: Familiarity with open-source vision and multimodal AI models. Fluency in Python and PyTorch. Familiarity with Slurm and working with clusters preferred. Basic web-development skills or interest in learning them will be useful.

Tech keywords: Python, PyTorch, Visual search, Saliency, Science portals, Vision AI, Vision-language models.

No planned longer absences

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ActiveVision

About

Uh oh!

Releases

Packages

License

m2b3/ActiveVision

Folders and files

Latest commit

History

Repository files navigation

ActiveVision

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages