An AI-powered image analysis extension for SwarmUI that generates detailed image descriptions for your prompts. Options to use local LLM with Ollama and OobaBooga WebUI. Also supports OpenAI API, OpenRouter API.
- Multiple backend options:
- Local LLM with Ollama (including remote Ollama installations)
- OpenAI API integration
- OpenRouter API support
- OobaBooga WebUI support with vision models
- Advanced model settings:
- Ollama: temperature, top_p, top_k, max_tokens, repeat_penalty, seed
- OpenAI: temperature, max_tokens, top_p, frequency_penalty, presence_penalty
- OpenRouter: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, repetition_penalty, top_k, min_p, top_a, seed
- OobaBooga: Model settings are configured through OobaBooga's WebUI interface
- Multiple preset User Prompts (Artistic Style, Facial Features, Color Palette, etc.)
- Batch Captioning for Lora dataset preparation (generate captions for multiple images in a folder)
- Creative LLM Toys:
- Image Fusion: Combine separate analyses of style, subject, and setting into cohesive prompts
- Object + Subject Fusion: Transform objects with character designs and create unique combinations
- Story Time: Transform images into detailed narratives with beginning, middle, and end
- Character Creator: Generate detailed character profiles for stories, games, or roleplay
- System Prompt support for all backends to customize model behavior
- Prompt Prepending for adding instructions in the front of all requests
- Custom preset support with reordering capability
- Direct-to-prompt generation
- Zero impact on VRAM when not in use (when using unload model setting)
- Image paste/upload support
- Image Drag and drop support
- Remote server connection support (Ollama and OobaBooga)
- Image compression option to prevent memory issues
- Analysis history with thumbnails and parameter reuse
- Enhanced error handling with helpful troubleshooting suggestions
-
Make sure you have SwarmUI installed and setup on your system.
- Ollama with a vision model installed
- For remote connections:
- Ollama server must be accessible on your network
- Port 11434 must be open on the server
- Server must be properly configured for remote access
- Valid OpenAI API key with access to vision models. Sign up and create an API key here: OpenAI
- Valid OpenRouter API key. Sign up and create an API key here: OpenRouter
- Optional: Custom site name for API requests
- OobaBooga Text Generation WebUI must be installed and set up
- The OpenAI extension must be enabled in OobaBooga's WebUI
- Vision models must be installed and configured in OobaBooga
- Default port is 5000 (can be configured)
- For remote connections:
- OobaBooga server must be accessible on your network
- Port 5000 (or your configured port) must be open
- Server must be properly configured for remote access
- Follow the Prerequisites section for your chosen backend
- Open SwarmUI
- Click on "Server" at the top of the page
- Click on "Extensions"
- Find "OllamaVision" in the list of available extensions
- Click the "Install" button
- A message will appear and click on "Restart Now"
- SwarmUI will restart and OllamaVision will be installed into the Utilities tab
- Open SwarmUI and navigate to the "Utilities" tab
- Click the "OllamaVision" tab
- Click the settings button to configure your preferred backend:
- Ollama (local or remote)
- OpenAI
- OpenRouter
- OobaBooga WebUI
- Click "Connect" to establish connection
- Select your preferred vision model from the dropdown list
- Configure model settings (optional):
- For Ollama, OpenAI, and OpenRouter: Adjust settings in OllamaVision interface
- For OobaBooga: Configure model settings in OobaBooga's WebUI interface
- Choose your User Prompt:
- Use the default preset
- Select from included presets
- Create and manage custom presets
- Load your image:
- Quick Paste: Click paste button +
CTRL+V
- File Upload: Click upload button to select local file
- Drag and Drop: Drag and Drop your image directly into the preview area.
- Quick Paste: Click paste button +
- Image preview will appear
- Click "Analyze Image" to begin processing
โ ๏ธ Processing time varies based on your setup. If no error appears, analysis is in progress.
- Once analysis completes, click "Send to Prompt"
- The AI-generated description will appear in the Generate tab
- Use the description as-is or customize it for your needs directly inside OllamaVision
- If you're using local LLM ensure Ollama or OobaBooga is running BEFORE trying connect
- For OobaBooga:
- Make sure the OpenAI extension is enabled in OobaBooga's WebUI
- Model settings are managed through OobaBooga's interface
- Default URL is http://localhost:5000
- Models will be automatically listed when connecting
- Selected models will load automatically when chosen from the dropdown
- Larger images may take longer to process use compression if running into memory errors
- Custom presets are saved between sessions
- You can edit descriptions before generating images directly in the Analysis Results text area
- For best results in LLM toys keep MAXTOKENS at -1 (set by default)
โ ๏ธ Note: This feature is experimental
Create captions for multiple images at once - perfect for training Lora models:
- Click the "Batch Caption" button in the LLM Toys section
- Select a folder containing your images (supported formats: jpg, jpeg, png, webp, bmp)
- Choose your caption style:
- Lora Type: Select between Style or Character Lora
- Caption Format: Choose Danbooru Tags or Natural Language descriptions
- For Style Loras, Natural Language typically works better
- For Character Loras, Danbooru Tags can be more effective
- Optionally add a trigger word that will be included in all captions
- Click "Start Captioning"
- The tool will process all images and create a corresponding .txt file with the same name as each image
- Images that already have caption files will be skipped
- View results in the table that shows success/error status for each image
This feature is designed to quickly prepare datasets for Lora training by leveraging vision models to automate the captioning process.
- Load your images using paste, upload, or drag & drop
- Analyze each image separately
- Edit the descriptions to your liking
- Click "Combine Analyses" to create a single prompt
- Edit the prompt to your liking
- Click "Send to Prompt" to generate an image
- Perfect for creating rich, multi-layered image generation prompts
- Click the "Fusion" button
- Select "Object + Subject" mode
- Load your object image (paste, upload, or drag & drop)
- Analyze and edit result as needed
- Load your subject image
- Analyze and edit result as needed
- Click "Combine" to generate fusion prompt
- Edit final prompt if desired
- Click "Send to Prompt" to generate
Perfect for:
- Creating custom designs on products (t-shirts, mugs, skateboards)
- Transforming furniture into character-themed pieces
- Designing custom figurines, sculptures, or plush toys
Create detailed characters with customizable attributes:
- Name, Sex, Species, Setting, Alignment, Class/Role
- Editable input fields for custom characters
- Editable response field to edit character before saving
- Smart controls with field locking and randomization
- Detailed output including personality, physical description, abilities, and backstory
- Multiple saving options:
- Save Character (Text): Simple text file of character description
- Save Character Image: PNG with embedded SillyTavern card data
- Export to SillyTavern: Direct export in JSON format
- Creates an AI image prompt to create a profile picture for your character
- An "Export Prompt" button that will extract the image prompt from the results and send it to the generation page for instant generation of your new character
Creating SillyTavern-Ready Character Cards:
- Open Character Creator
- Choose your desired options (Species, Alignment, Role, etc.)
- Generate your character description
- Send the prompt to the Generate tab
- Generate your character's image
- Return to Character Creator
- Click "Save Character Image"
- Load your generated image in the popup
- Click Save
- Your image will download with all character data embedded
- Ready to import directly into SillyTavern!
NOTE: If your creations are getting cut off make sure MAXTOKENS is set to -1 (set by default)
- Load your image using paste, upload, or drag & drop
- Click "Tell me a story"
- Stories are displayed in a wide-format reading area for comfort
- For best results ensure MAXTOKENS is set to -1 (set by default)
Here's a showcase of OllamaVision's capabilities using different presets. Each example includes the source image, AI-generated description, and the final generated output.
View Example
This image features a vibrant array of rainbow-colored umbrellas suspended in the air, creating a visually stunning display against the backdrop of a clear blue sky. The color palette consists of:
- Red
- Orange
- Yellow
- Green
- Blue
- Purple
Each umbrella is distinctly colored, with no discernible pattern. The umbrellas appear evenly spaced throughout the frame, creating a sense of harmony and balance within the composition.
View Example
Facial Characteristics:
- Eyes: Brown
- Eyebrows: Thick and well-groomed
- Nose: Straight and moderately sized
- Mouth: Shaped like a smile with full teeth showing
- Chin: Rounded in shape
- Skin tone: Light brown
- Facial hair: Well-groomed beard
- Hair color: Brown
- Ears: Small, located just below the jawline
- Distinguishing features:
- Numerous freckles across face and neck
- 2 small moles under eyes
- mcmonkey for making OllamaVision official and for giving us SwarmUI
- SouthbayJay for testing and feedback and all the late nights!