Skip to content

Fully offline-capable AI assistant for answering operational, admin, and troubleshooting questions for on-premises S3-compatible platforms.

License

Notifications You must be signed in to change notification settings

hndrwn-dk/s3-onprem-ai-assistant

Repository files navigation

⚡ S3 On-Prem AI Assistant - Speed Optimized

Python LangChain Streamlit FastAPI License Offline AI PHI3 Powered Performance

A lightning-fast, fully offline-capable AI assistant for answering operational, admin, and troubleshooting questions for on-premises S3-compatible platforms. 15-60x faster than typical implementations with advanced caching and optimization.

☕ Support Me

If you find this project helpful, you can support me here:

Buy Me a Coffee

🏢 Supported Platforms

  • 📦 Cloudian HyperStore
  • 🧊 Huawei OceanStor
  • Pure FlashBlade
  • 🏢 IBM Cloud Object Storage
  • 🐳 MinIO
  • 🌐 Dell ECS
  • 📊 NetApp StorageGRID
  • And more S3-compatible storage systems

⚡ Key Features

  • 🚀 Lightning Fast: Subsecond responses with multi-tier caching
  • 🧠 Smart Search: Vector + pre-indexed + fallback search
  • 📱 Multi-Interface: CLI, Web UI, and REST API
  • 🔒 Fully Offline: No cloud dependency, runs entirely on-premises
  • 📊 Performance Monitoring: Built-in timing and metrics
  • 🎯 Intelligent Fallbacks: Progressive search strategy
  • 💾 Response Caching: Instant answers for repeated queries
  • 🔍 Document Types: PDF, TXT, JSON, MD support

🚀 Performance Improvements

Query Type Before After Improvement
Cached queries 3-5s 0.01s 300-500x faster
Bucket dept queries 5-10s 0.1-0.5s 20-100x faster
Vector search 5-15s 1-3s 5-10x faster
Model loading 5-10s 2-5s 2-3x faster

Overall: 15-60x faster responses! 🚀


📁 Project Structure (Visual)

s3_onprem_ai_assistant/
├── 🔧 Core Files
│   ├── config.py               # Optimized settings
│   ├── model_cache.py          # Model caching system
│   ├── response_cache.py       # Response caching
│   ├── bucket_index.py         # Fast bucket search
│   └── utils.py                # Optimized utilities
├── 🏗️ Build & Processing
│   └── build_embeddings_all.py # Optimized embedding builder
├── 🖥️ User Interfaces
│   ├── s3ai_query.py           # Ultra-fast CLI
│   ├── api.py                  # Lightning-fast API
│   └── streamlit_ui.py         # Ultra-fast Web UI
├── 📄 Documentation
│   └── docs/                   # Place your files here
│       ├── *.pdf               # Admin guides
│       ├── *.json              # Metadata files
│       ├── *.txt               # Text documents
│       └── *.md                # Markdown files
└── 🗂️ Auto-generated
    ├── cache/                  # Response cache
    ├── s3_all_docs/           # Vector store
    └── s3_all_chunks.pkl      # Document chunks

🔧 Requirements

  • Python: 3.8+ (Recommended: 3.9-3.11)
  • RAM: Minimum 8GB (Recommended: 16GB+)
  • Storage: 5GB+ for models and cache
  • Ollama: For local LLM inference

🚀 Quick Start

1️⃣ Clone and Setup

git clone https://github.com/hndrwn-dk/s3-onprem-ai-assistant.git
cd s3-onprem-ai-assistant

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2️⃣ Install Ollama + Fast Model

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the optimized model (much faster than mistral)
ollama pull phi3:mini

3️⃣ Prepare Your Documents

# Place or upload your files in docs/
docs/
├── cloudian_admin_guide.pdf
├── minio_operations_manual.pdf
├── sample_bucket_metadata_converted.txt
└── other_documents...

4️⃣ Build Optimized Index

# Build vector embeddings with optimization
python build_embeddings_all.py

5️⃣ Start Querying!

# Ultra-fast CLI (recommended for speed)
python s3ai_query.py "show all buckets under dept: engineering"

# Web UI with progress indicators
streamlit run streamlit_ui.py

# REST API
python api.py
# Visit: http://localhost:8000/docs

🧠 Advanced Search Architecture

Multi-Tier Speed Strategy

Query → Cache Check → Quick Bucket Search → Vector Search → Text Fallback
  ↓         ↓              ↓                 ↓             ↓
0.01s    0.1-0.5s       1-3s              2-5s        Last Resort

1️⃣ Cache Layer (Instant)

  • Response caching with TTL
  • Instant answers for repeated queries
  • Automatic cache management

2️⃣ Quick Bucket Search (0.1-0.5s)

  • Pre-indexed department/label searches
  • Pattern matching for common queries
  • Optimized for bucket metadata

3️⃣ Vector Search (1-3s)

  • FAISS-powered semantic search
  • Optimized chunking strategy
  • Reduced search parameters

4️⃣ Text Fallback (2-5s)

  • Direct text matching
  • Context-aware responses
  • Comprehensive coverage

💬 Example Queries

Category Query Expected Response Time
🔍 Bucket Lookup "show all buckets under dept: engineering" 0.1-0.5s
🏷️ Label Search "find buckets with label: backup" 0.1-0.5s
🔧 Troubleshooting "How to fix S3 error code 403?" 1-3s
📊 Metadata "Show requestor_email for Bucket-001" 0.1-0.5s
🛠️ Admin Tasks "How to purge versioned bucket in Cloudian?" 1-3s
🔄 Cached Queries Any previously asked question 0.01s

🔌 API Usage

REST API Examples

# Health check
curl -X GET "http://localhost:8000/health"

# Query with performance metrics
curl -X POST "http://localhost:8000/ask" \
  -H "Content-Type: application/json" \
  -d '{"question": "show all buckets under dept: engineering"}'

# Response includes timing
{
  "answer": "Here are the buckets under dept: engineering...",
  "source": "quick_search",
  "response_time": 0.23
}

Postman Collection

{
  "info": {
    "name": "S3 On-Prem AI Assistant v2.3.0 - Security & Deployability",
    "description": "Lightning-fast queries with performance metrics"
  },
  "item": [
    {
      "name": "Fast Department Search",
      "request": {
        "method": "POST",
        "header": [{"key": "Content-Type", "value": "application/json"}],
        "body": {
          "mode": "raw",
          "raw": "{\"question\": \"show all buckets under dept: engineering\"}"
        },
        "url": "http://localhost:8000/ask"
      }
    },
    {
      "name": "Performance Health Check",
      "request": {
        "method": "GET",
        "url": "http://localhost:8000/health"
      }
    }
  ]
}

🔧 Configuration

Performance Settings (config.py)

# Speed optimizations
VECTOR_SEARCH_K = 3      # Reduced from 5 for speed
CHUNK_SIZE = 800         # Optimized chunk size
CHUNK_OVERLAP = 100      # Optimized overlap
CACHE_TTL_HOURS = 24     # Response cache TTL

Model Configuration

# Fast model settings
MODEL = "phi3:mini"      # Much faster than mistral
TEMPERATURE = 0.3        # Balanced creativity/speed
NUM_PREDICT = 512        # Limit response length

📋 Troubleshooting

Common Issues

  1. Slow first query: Models loading (normal, cached afterward)
  2. No phi3:mini model: Run ollama pull phi3:mini
  3. Empty responses: Rebuild embeddings with python build_embeddings_all.py
  4. Memory issues: Reduce CHUNK_SIZE in config.py

Performance Optimization

# Clear cache for fresh start
rm -rf cache/

# Rebuild optimized embeddings
python build_embeddings_all.py

# Check model cache status
python -c "from model_cache import ModelCache; print(ModelCache.get_load_times())"

🧪 Tested Environment

  • Python 3.8-3.11
  • Ubuntu 20.04+ / CentOS 7+ / Windows 10+
  • LangChain 0.1+
  • Streamlit 1.29+
  • FastAPI 0.108+
  • Ollama (PHI3-mini)

📘 License

This project is licensed under the MIT License - see the LICENSE file for details.


🤝 Contributing

We welcome contributions! Areas of interest:

  • 🏢 Additional S3 vendor support
  • Performance optimizations
  • 🧠 New AI models
  • 📊 Enhanced monitoring
  • 🔍 Better search algorithms

How to Contribute

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

🔒 Security

  • Set an API key by exporting API_KEY or in docker-compose.yml. Protected endpoints require header X-API-Key: <value>.
  • Configure allowed CORS origins via CORS_ORIGINS (comma-separated), default *.
  • Note: FAISS loading uses a configurable flag ALLOW_DANGEROUS_DESERIALIZATION (default true). Keep indices in a trusted location.

🐳 Docker Quickstart

# Build and run API + Ollama
docker compose up --build -d

# Pull model in Ollama container (first run)
docker exec -it $(docker ps -q -f name=ollama) ollama pull phi3:mini

# Call the API
curl -H "X-API-Key: $API_KEY" -X POST localhost:8000/ask \
  -H 'Content-Type: application/json' \
  -d '{"question": "show all buckets under dept: engineering"}'

⚡ Built for Speed • 🔒 Privacy-First • 🚀 Production-Ready

Made with ❤️ for the S3 storage community

About

Fully offline-capable AI assistant for answering operational, admin, and troubleshooting questions for on-premises S3-compatible platforms.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •