Visionary.AI is a next-gen visual assistant powered by Gemini Vision API. Upload an image, ask questions, and receive deep contextual insights in real-time. From document interpretation to object recognition, Visionary.AI helps you see, know, and act—smarter.
🔗 Click to Try Visionary.AI
🎥 Watch Demo Video
- 🧠 Multimodal Intelligence: Understands images and responds to custom queries using Gemini's Vision API.
- 📄 Context Extraction: Summarizes documents, posters, charts, or scenes.
- 🎤 Voice Input & Output: Ask questions via mic and get AI answers in speech.
- 🖼️ Smart Upload UI: Drag and drop images, annotate, or use webcam.
- 🧾 Response Types: Summaries, bullet points, key details, and follow-ups.
- 🌙 Dark Mode & Theming: Switch between light/dark gradients dynamically.
- 🕓 History Log: See past queries and revisit old results (local cache).
- ⚡ Responsive & Mobile-Ready: Works seamlessly across devices.
Layer | Tools |
---|---|
Frontend | React.js, Tailwind CSS, Redux Toolkit |
AI APIs | Gemini Vision (Google AI Studio), Web Speech API |
Animations | Framer Motion, Lottie |
Deployment | Azure Static Web Apps / Firebase Hosting |
Voice | Web Speech (Speech-to-Text + TTS) |
State | Redux or Recoil |
/src
/components
- UploadImage.jsx
- OutputPanel.jsx
- VoiceInput.jsx
- Loader.jsx
/api
- gemini.js
/redux
- store.js
/utils
- promptBuilder.js
App.jsx
index.js
- 📄 User uploads an image or uses webcam input
- ✍️ Optional: User types a query (e.g. “What’s in this prescription image?”)
- ⚙️ Frontend builds prompt and sends to Gemini Vision API
- 🧠 Gemini returns contextual result → displayed in animated output panel
- 🔀 User can ask follow-up or switch to voice interaction
// Example API call
const response = await geminiVision.generate({
image: uploadedImage,
prompt: "Summarize the content of this image",
});
Use environment variable
.env
for API keyVITE_GEMINI_API_KEY=your_key_here
Upload Image | Gemini Response |
---|---|
![]() |
![]() |
git clone https://github.com/your-username/visionary-ai.git
cd visionary-ai
npm install
npm run dev
- ✏️ Students analyzing handwritten notes
- 🧾 Professionals summarizing documents/posters
- 👃 Accessibility for visually impaired (speech response)
- 🍭 Image-based product search & insight
- 🔍 OCR overlay highlights on image
- 🌍 Multilingual voice translation
- 🧠 Model comparison (Gemini vs GPT-4 Vision)
- 🧾 Export response to PDF/Notion
- 💡 Devanshi Awasthi – Project Lead, Frontend Dev, AI Integrator
Want to contribute? Feel free to fork, star, and PR! ⭐
MIT License © 2025 Devanshi Awasthi
If you like this project, leave a ⭐ on GitHub! For feature requests or bugs, open an issue!