Skip to content

A powerful multimodal AI assistant built with Python, Streamlit, and Google Gemini 2.0 Flash model. This app allows users to analyze uploaded or captured images and generate intelligent responses based on the visual content.

Notifications You must be signed in to change notification settings

Farhan-Feb/AI-Visual-Assistant-Multimodal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 AI Visual Assistant with Gemini 2.0 & Streamlit

An interactive multimodal application that allows you to upload or capture screenshots of images and ask intelligent questions about them using Gemini 2.0 Flash.


🚀 Features

  • Upload or capture screenshots of any window
  • Intelligent image analysis using Google’s Gemini 2.0 Flash
  • Streamlit UI for easy interaction
  • Supports custom queries and AI-generated insights

🖼 Example Use Case

Upload a screenshot of a chart or window

Ask questions like:

"What does this chart represent?" "Summarize the image contents" "Identify key trends or values"

🧠 Powered By

Google Gemini 2.0 Flash via google-genai Streamlit Pillow, PyAutoGUI, PyGetWindow

About

A powerful multimodal AI assistant built with Python, Streamlit, and Google Gemini 2.0 Flash model. This app allows users to analyze uploaded or captured images and generate intelligent responses based on the visual content.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages