Skip to content

An extension for SwarmUI that allows you to connect to Ollama, OpenAI, and OpenRouter to use vision models for image analysis to create image prompts.

License

Notifications You must be signed in to change notification settings

Urabewe/OllamaVision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OllamaVision

An AI-powered image analysis extension for SwarmUI that generates detailed image descriptions for your prompts. Options to use local LLM with Ollama and OobaBooga WebUI. Also supports OpenAI API, OpenRouter API.

logo

๐ŸŒŸ Table of Contents

๐ŸŒŸ Features

  • Multiple backend options:
    • Local LLM with Ollama (including remote Ollama installations)
    • OpenAI API integration
    • OpenRouter API support
    • OobaBooga WebUI support with vision models
  • Advanced model settings:
    • Ollama: temperature, top_p, top_k, max_tokens, repeat_penalty, seed
    • OpenAI: temperature, max_tokens, top_p, frequency_penalty, presence_penalty
    • OpenRouter: temperature, max_tokens, top_p, frequency_penalty, presence_penalty, repetition_penalty, top_k, min_p, top_a, seed
    • OobaBooga: Model settings are configured through OobaBooga's WebUI interface
  • Multiple preset User Prompts (Artistic Style, Facial Features, Color Palette, etc.)
  • Batch Captioning for Lora dataset preparation (generate captions for multiple images in a folder)
  • Creative LLM Toys:
    • Image Fusion: Combine separate analyses of style, subject, and setting into cohesive prompts
    • Object + Subject Fusion: Transform objects with character designs and create unique combinations
    • Story Time: Transform images into detailed narratives with beginning, middle, and end
    • Character Creator: Generate detailed character profiles for stories, games, or roleplay
  • System Prompt support for all backends to customize model behavior
  • Prompt Prepending for adding instructions in the front of all requests
  • Custom preset support with reordering capability
  • Direct-to-prompt generation
  • Zero impact on VRAM when not in use (when using unload model setting)
  • Image paste/upload support
  • Image Drag and drop support
  • Remote server connection support (Ollama and OobaBooga)
  • Image compression option to prevent memory issues
  • Analysis history with thumbnails and parameter reuse
  • Enhanced error handling with helpful troubleshooting suggestions

๐Ÿ“‹ Prerequisites

First and foremost:

  • Make sure you have SwarmUI installed and setup on your system.

For Ollama:

  • Ollama with a vision model installed
  • For remote connections:
    • Ollama server must be accessible on your network
    • Port 11434 must be open on the server
    • Server must be properly configured for remote access

For OpenAI:

  • Valid OpenAI API key with access to vision models. Sign up and create an API key here: OpenAI

For OpenRouter:

  • Valid OpenRouter API key. Sign up and create an API key here: OpenRouter
  • Optional: Custom site name for API requests

For OobaBooga:

  • OobaBooga Text Generation WebUI must be installed and set up
  • The OpenAI extension must be enabled in OobaBooga's WebUI
  • Vision models must be installed and configured in OobaBooga
  • Default port is 5000 (can be configured)
  • For remote connections:
    • OobaBooga server must be accessible on your network
    • Port 5000 (or your configured port) must be open
    • Server must be properly configured for remote access

๐Ÿ› ๏ธ Installation

  1. Follow the Prerequisites section for your chosen backend
  2. Open SwarmUI
  3. Click on "Server" at the top of the page
  4. Click on "Extensions"
  5. Find "OllamaVision" in the list of available extensions
  6. Click the "Install" button
  7. A message will appear and click on "Restart Now"
  8. SwarmUI will restart and OllamaVision will be installed into the Utilities tab

๐Ÿ’ก Usage Guide

๐Ÿš€ Getting Started

  1. Open SwarmUI and navigate to the "Utilities" tab
  2. Click the "OllamaVision" tab
  3. Click the settings button to configure your preferred backend:
    • Ollama (local or remote)
    • OpenAI
    • OpenRouter
    • OobaBooga WebUI
  4. Click "Connect" to establish connection

๐ŸŽฏ Setup & Configuration

  1. Select your preferred vision model from the dropdown list
  2. Configure model settings (optional):
    • For Ollama, OpenAI, and OpenRouter: Adjust settings in OllamaVision interface
    • For OobaBooga: Configure model settings in OobaBooga's WebUI interface
  3. Choose your User Prompt:
    • Use the default preset
    • Select from included presets
    • Create and manage custom presets

๐Ÿ“ธ Image Analysis

  1. Load your image:
    • Quick Paste: Click paste button + CTRL+V
    • File Upload: Click upload button to select local file
    • Drag and Drop: Drag and Drop your image directly into the preview area.
  2. Image preview will appear
  3. Click "Analyze Image" to begin processing

    โš ๏ธ Processing time varies based on your setup. If no error appears, analysis is in progress.

๐ŸŽจ Using the Results

  1. Once analysis completes, click "Send to Prompt"
  2. The AI-generated description will appear in the Generate tab
  3. Use the description as-is or customize it for your needs directly inside OllamaVision

๐Ÿ”‘ Quick Tips

  • If you're using local LLM ensure Ollama or OobaBooga is running BEFORE trying connect
  • For OobaBooga:
    • Make sure the OpenAI extension is enabled in OobaBooga's WebUI
    • Model settings are managed through OobaBooga's interface
    • Default URL is http://localhost:5000
    • Models will be automatically listed when connecting
    • Selected models will load automatically when chosen from the dropdown
  • Larger images may take longer to process use compression if running into memory errors
  • Custom presets are saved between sessions
  • You can edit descriptions before generating images directly in the Analysis Results text area
  • For best results in LLM toys keep MAXTOKENS at -1 (set by default)

๐ŸŽฎ LLM Toys Guide

๐Ÿ“Š Batch Captioning

โš ๏ธ Note: This feature is experimental

Create captions for multiple images at once - perfect for training Lora models:

  1. Click the "Batch Caption" button in the LLM Toys section
  2. Select a folder containing your images (supported formats: jpg, jpeg, png, webp, bmp)
  3. Choose your caption style:
    • Lora Type: Select between Style or Character Lora
    • Caption Format: Choose Danbooru Tags or Natural Language descriptions
    • For Style Loras, Natural Language typically works better
    • For Character Loras, Danbooru Tags can be more effective
  4. Optionally add a trigger word that will be included in all captions
  5. Click "Start Captioning"
  6. The tool will process all images and create a corresponding .txt file with the same name as each image
  7. Images that already have caption files will be skipped
  8. View results in the table that shows success/error status for each image

This feature is designed to quickly prepare datasets for Lora training by leveraging vision models to automate the captioning process.

๐ŸŽจ Image Fusion

  1. Load your images using paste, upload, or drag & drop
  2. Analyze each image separately
  3. Edit the descriptions to your liking
  4. Click "Combine Analyses" to create a single prompt
  5. Edit the prompt to your liking
  6. Click "Send to Prompt" to generate an image
  7. Perfect for creating rich, multi-layered image generation prompts

๐Ÿ”„ Object + Subject Fusion

  1. Click the "Fusion" button
  2. Select "Object + Subject" mode
  3. Load your object image (paste, upload, or drag & drop)
  4. Analyze and edit result as needed
  5. Load your subject image
  6. Analyze and edit result as needed
  7. Click "Combine" to generate fusion prompt
  8. Edit final prompt if desired
  9. Click "Send to Prompt" to generate

Perfect for:

  • Creating custom designs on products (t-shirts, mugs, skateboards)
  • Transforming furniture into character-themed pieces
  • Designing custom figurines, sculptures, or plush toys

๐ŸŽญ Character Creator

Create detailed characters with customizable attributes:

  • Name, Sex, Species, Setting, Alignment, Class/Role
  • Editable input fields for custom characters
  • Editable response field to edit character before saving
  • Smart controls with field locking and randomization
  • Detailed output including personality, physical description, abilities, and backstory
  • Multiple saving options:
    • Save Character (Text): Simple text file of character description
    • Save Character Image: PNG with embedded SillyTavern card data
    • Export to SillyTavern: Direct export in JSON format
  • Creates an AI image prompt to create a profile picture for your character
  • An "Export Prompt" button that will extract the image prompt from the results and send it to the generation page for instant generation of your new character

Creating SillyTavern-Ready Character Cards:

  1. Open Character Creator
  2. Choose your desired options (Species, Alignment, Role, etc.)
  3. Generate your character description
  4. Send the prompt to the Generate tab
  5. Generate your character's image
  6. Return to Character Creator
  7. Click "Save Character Image"
  8. Load your generated image in the popup
  9. Click Save
  10. Your image will download with all character data embedded
  11. Ready to import directly into SillyTavern!

NOTE: If your creations are getting cut off make sure MAXTOKENS is set to -1 (set by default)

๐Ÿ“š Story Time

  1. Load your image using paste, upload, or drag & drop
  2. Click "Tell me a story"
  3. Stories are displayed in a wide-format reading area for comfort
  4. For best results ensure MAXTOKENS is set to -1 (set by default)

๐ŸŽฏ Example Outputs

Here's a showcase of OllamaVision's capabilities using different presets. Each example includes the source image, AI-generated description, and the final generated output.

๐ŸŒˆ Color Palette Analysis

View Example

Source Image

Rainbow Umbrellas

AI-Generated Description

This image features a vibrant array of rainbow-colored umbrellas suspended in the air, creating a visually stunning display against the backdrop of a clear blue sky. The color palette consists of:

  • Red
  • Orange
  • Yellow
  • Green
  • Blue
  • Purple

Each umbrella is distinctly colored, with no discernible pattern. The umbrellas appear evenly spaced throughout the frame, creating a sense of harmony and balance within the composition.

Generated Result

Generated Umbrellas

๐Ÿ‘ค Facial Features Analysis

View Example

Source Image

Portrait

AI-Generated Description

Facial Characteristics:

  • Eyes: Brown
  • Eyebrows: Thick and well-groomed
  • Nose: Straight and moderately sized
  • Mouth: Shaped like a smile with full teeth showing
  • Chin: Rounded in shape
  • Skin tone: Light brown
  • Facial hair: Well-groomed beard
  • Hair color: Brown
  • Ears: Small, located just below the jawline
  • Distinguishing features:
    • Numerous freckles across face and neck
    • 2 small moles under eyes

Generated Result

Generated Portrait

๐Ÿ™ Acknowledgments

  • mcmonkey for making OllamaVision official and for giving us SwarmUI
  • SouthbayJay for testing and feedback and all the late nights!

About

An extension for SwarmUI that allows you to connect to Ollama, OpenAI, and OpenRouter to use vision models for image analysis to create image prompts.

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published

Contributors 2

  •  
  •