OllamaVision

An AI-powered image analysis extension for SwarmUI that generates detailed image descriptions for your prompts. Options to use local LLM with Ollama and OobaBooga WebUI. Also supports OpenAI API, OpenRouter API.

🌟 Table of Contents

Features
Prerequisites
Installation
Usage Guide
LLM Toys Guide
Example Outputs
- Color Palette Analysis
- Facial Features Analysis
Acknowledgments

🌟 Features

Multiple backend options:
- Local LLM with Ollama (including remote Ollama installations)
- OpenAI API integration
- OpenRouter API support
- OobaBooga WebUI support with vision models
Advanced model settings:
- Ollama: temperature, top_p, top_k, max_tokens, repeat_penalty, seed
- OpenAI: temperature, max_tokens, top_p, frequency_penalty, presence_penalty
- OpenRouter: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, repetition_penalty, top_k, min_p, top_a, seed
- OobaBooga: Model settings are configured through OobaBooga's WebUI interface
Multiple preset User Prompts (Artistic Style, Facial Features, Color Palette, etc.)
Batch Captioning for Lora dataset preparation (generate captions for multiple images in a folder)
Creative LLM Toys:
- Image Fusion: Combine separate analyses of style, subject, and setting into cohesive prompts
- Object + Subject Fusion: Transform objects with character designs and create unique combinations
- Story Time: Transform images into detailed narratives with beginning, middle, and end
- Character Creator: Generate detailed character profiles for stories, games, or roleplay
System Prompt support for all backends to customize model behavior
Prompt Prepending for adding instructions in the front of all requests
Custom preset support with reordering capability
Direct-to-prompt generation
Zero impact on VRAM when not in use (when using unload model setting)
Image paste/upload support
Image Drag and drop support
Remote server connection support (Ollama and OobaBooga)
Image compression option to prevent memory issues
Analysis history with thumbnails and parameter reuse
Enhanced error handling with helpful troubleshooting suggestions

📋 Prerequisites

First and foremost:

Make sure you have SwarmUI installed and setup on your system.

For Ollama:

Ollama with a vision model installed
For remote connections:
- Ollama server must be accessible on your network
- Port 11434 must be open on the server
- Server must be properly configured for remote access

For OpenAI:

Valid OpenAI API key with access to vision models. Sign up and create an API key here: OpenAI

For OpenRouter:

Valid OpenRouter API key. Sign up and create an API key here: OpenRouter
Optional: Custom site name for API requests

For OobaBooga:

OobaBooga Text Generation WebUI must be installed and set up
The OpenAI extension must be enabled in OobaBooga's WebUI
Vision models must be installed and configured in OobaBooga
Default port is 5000 (can be configured)
For remote connections:
- OobaBooga server must be accessible on your network
- Port 5000 (or your configured port) must be open
- Server must be properly configured for remote access

🛠️ Installation

Follow the Prerequisites section for your chosen backend
Open SwarmUI
Click on "Server" at the top of the page
Click on "Extensions"
Find "OllamaVision" in the list of available extensions
Click the "Install" button
A message will appear and click on "Restart Now"
SwarmUI will restart and OllamaVision will be installed into the Utilities tab

💡 Usage Guide

🚀 Getting Started

Open SwarmUI and navigate to the "Utilities" tab
Click the "OllamaVision" tab
Click the settings button to configure your preferred backend:
- Ollama (local or remote)
- OpenAI
- OpenRouter
- OobaBooga WebUI
Click "Connect" to establish connection

🎯 Setup & Configuration

Select your preferred vision model from the dropdown list
Configure model settings (optional):
- For Ollama, OpenAI, and OpenRouter: Adjust settings in OllamaVision interface
- For OobaBooga: Configure model settings in OobaBooga's WebUI interface
Choose your User Prompt:
- Use the default preset
- Select from included presets
- Create and manage custom presets

📸 Image Analysis

Load your image:
- Quick Paste: Click paste button + CTRL+V
- File Upload: Click upload button to select local file
- Drag and Drop: Drag and Drop your image directly into the preview area.
Image preview will appear
Click "Analyze Image" to begin processing

⚠️ Processing time varies based on your setup. If no error appears, analysis is in progress.

🎨 Using the Results

Once analysis completes, click "Send to Prompt"
The AI-generated description will appear in the Generate tab
Use the description as-is or customize it for your needs directly inside OllamaVision

🔑 Quick Tips

If you're using local LLM ensure Ollama or OobaBooga is running BEFORE trying connect
For OobaBooga:
- Make sure the OpenAI extension is enabled in OobaBooga's WebUI
- Model settings are managed through OobaBooga's interface
- Default URL is http://localhost:5000
- Models will be automatically listed when connecting
- Selected models will load automatically when chosen from the dropdown
Larger images may take longer to process use compression if running into memory errors
Custom presets are saved between sessions
You can edit descriptions before generating images directly in the Analysis Results text area
For best results in LLM toys keep MAXTOKENS at -1 (set by default)

🎮 LLM Toys Guide

📊 Batch Captioning

⚠️ Note: This feature is experimental

Create captions for multiple images at once - perfect for training Lora models:

Click the "Batch Caption" button in the LLM Toys section
Select a folder containing your images (supported formats: jpg, jpeg, png, webp, bmp)
Choose your caption style:
- Lora Type: Select between Style or Character Lora
- Caption Format: Choose Danbooru Tags or Natural Language descriptions
- For Style Loras, Natural Language typically works better
- For Character Loras, Danbooru Tags can be more effective
Optionally add a trigger word that will be included in all captions
Click "Start Captioning"
The tool will process all images and create a corresponding .txt file with the same name as each image
Images that already have caption files will be skipped
View results in the table that shows success/error status for each image

This feature is designed to quickly prepare datasets for Lora training by leveraging vision models to automate the captioning process.

🎨 Image Fusion

Load your images using paste, upload, or drag & drop
Analyze each image separately
Edit the descriptions to your liking
Click "Combine Analyses" to create a single prompt
Edit the prompt to your liking
Click "Send to Prompt" to generate an image
Perfect for creating rich, multi-layered image generation prompts

🔄 Object + Subject Fusion

Click the "Fusion" button
Select "Object + Subject" mode
Load your object image (paste, upload, or drag & drop)
Analyze and edit result as needed
Load your subject image
Analyze and edit result as needed
Click "Combine" to generate fusion prompt
Edit final prompt if desired
Click "Send to Prompt" to generate

Perfect for:

Creating custom designs on products (t-shirts, mugs, skateboards)
Transforming furniture into character-themed pieces
Designing custom figurines, sculptures, or plush toys

🎭 Character Creator

Create detailed characters with customizable attributes:

Name, Sex, Species, Setting, Alignment, Class/Role
Editable input fields for custom characters
Editable response field to edit character before saving
Smart controls with field locking and randomization
Detailed output including personality, physical description, abilities, and backstory
Multiple saving options:
- Save Character (Text): Simple text file of character description
- Save Character Image: PNG with embedded SillyTavern card data
- Export to SillyTavern: Direct export in JSON format
Creates an AI image prompt to create a profile picture for your character
An "Export Prompt" button that will extract the image prompt from the results and send it to the generation page for instant generation of your new character

Creating SillyTavern-Ready Character Cards:

Open Character Creator
Choose your desired options (Species, Alignment, Role, etc.)
Generate your character description
Send the prompt to the Generate tab
Generate your character's image
Return to Character Creator
Click "Save Character Image"
Load your generated image in the popup
Click Save
Your image will download with all character data embedded
Ready to import directly into SillyTavern!

NOTE: If your creations are getting cut off make sure MAXTOKENS is set to -1 (set by default)

📚 Story Time

Load your image using paste, upload, or drag & drop
Click "Tell me a story"
Stories are displayed in a wide-format reading area for comfort
For best results ensure MAXTOKENS is set to -1 (set by default)

🎯 Example Outputs

Here's a showcase of OllamaVision's capabilities using different presets. Each example includes the source image, AI-generated description, and the final generated output.

🌈 Color Palette Analysis

View Example

Source Image

AI-Generated Description

This image features a vibrant array of rainbow-colored umbrellas suspended in the air, creating a visually stunning display against the backdrop of a clear blue sky. The color palette consists of:

Red

Orange

Yellow

Green

Blue

Purple

Each umbrella is distinctly colored, with no discernible pattern. The umbrellas appear evenly spaced throughout the frame, creating a sense of harmony and balance within the composition.

Generated Result

👤 Facial Features Analysis

View Example

Source Image

AI-Generated Description

Facial Characteristics:

Eyes: Brown

Eyebrows: Thick and well-groomed

Nose: Straight and moderately sized

Mouth: Shaped like a smile with full teeth showing

Chin: Rounded in shape

Skin tone: Light brown

Facial hair: Well-groomed beard

Hair color: Brown

Ears: Small, located just below the jawline

Distinguishing features:

Numerous freckles across face and neck

2 small moles under eyes

Generated Result

🙏 Acknowledgments

mcmonkey for making OllamaVision official and for giving us SwarmUI
SouthbayJay for testing and feedback and all the late nights!

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.github		.github
Assets		Assets
WebAPI		WebAPI
.gitignore		.gitignore
BackendSchema.cs		BackendSchema.cs
LICENSE		LICENSE
OllamaVisionExtension.cs		OllamaVisionExtension.cs
README.md		README.md

Uh oh!

License

Urabewe/OllamaVision

Folders and files

Latest commit

History

Repository files navigation

OllamaVision

🌟 Table of Contents

🌟 Features

📋 Prerequisites

First and foremost:

Make sure you have SwarmUI installed and setup on your system.

For Ollama:

For OpenAI:

For OpenRouter:

For OobaBooga:

🛠️ Installation

💡 Usage Guide

🚀 Getting Started

🎯 Setup & Configuration

📸 Image Analysis

🎨 Using the Results

🔑 Quick Tips

🎮 LLM Toys Guide

📊 Batch Captioning

🎨 Image Fusion

🔄 Object + Subject Fusion

🎭 Character Creator

📚 Story Time

🎯 Example Outputs

🌈 Color Palette Analysis

Source Image

AI-Generated Description

Generated Result

👤 Facial Features Analysis

Source Image

AI-Generated Description

Generated Result

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Packages