⚡ S3 On-Prem AI Assistant - Speed Optimized

A lightning-fast, fully offline-capable AI assistant for answering operational, admin, and troubleshooting questions for on-premises S3-compatible platforms. 15-60x faster than typical implementations with advanced caching and optimization.

☕ Support Me

If you find this project helpful, you can support me here:

🏢 Supported Platforms

📦 Cloudian HyperStore
🧊 Huawei OceanStor
⚡ Pure FlashBlade
🏢 IBM Cloud Object Storage
🐳 MinIO
🌐 Dell ECS
📊 NetApp StorageGRID
And more S3-compatible storage systems

⚡ Key Features

🚀 Lightning Fast: Subsecond responses with multi-tier caching
🧠 Smart Search: Vector + pre-indexed + fallback search
📱 Multi-Interface: CLI, Web UI, and REST API
🔒 Fully Offline: No cloud dependency, runs entirely on-premises
📊 Performance Monitoring: Built-in timing and metrics
🎯 Intelligent Fallbacks: Progressive search strategy
💾 Response Caching: Instant answers for repeated queries
🔍 Document Types: PDF, TXT, JSON, MD support

🚀 Performance Improvements

Query Type	Before	After	Improvement
Cached queries	3-5s	0.01s	300-500x faster
Bucket dept queries	5-10s	0.1-0.5s	20-100x faster
Vector search	5-15s	1-3s	5-10x faster
Model loading	5-10s	2-5s	2-3x faster

Overall: 15-60x faster responses! 🚀

📁 Project Structure (Visual)

s3_onprem_ai_assistant/
├── 🔧 Core Files
│   ├── config.py               # Optimized settings
│   ├── model_cache.py          # Model caching system
│   ├── response_cache.py       # Response caching
│   ├── bucket_index.py         # Fast bucket search
│   └── utils.py                # Optimized utilities
├── 🏗️ Build & Processing
│   └── build_embeddings_all.py # Optimized embedding builder
├── 🖥️ User Interfaces
│   ├── s3ai_query.py           # Ultra-fast CLI
│   ├── api.py                  # Lightning-fast API
│   └── streamlit_ui.py         # Ultra-fast Web UI
├── 📄 Documentation
│   └── docs/                   # Place your files here
│       ├── *.pdf               # Admin guides
│       ├── *.json              # Metadata files
│       ├── *.txt               # Text documents
│       └── *.md                # Markdown files
└── 🗂️ Auto-generated
    ├── cache/                  # Response cache
    ├── s3_all_docs/           # Vector store
    └── s3_all_chunks.pkl      # Document chunks

🔧 Requirements

Python: 3.8+ (Recommended: 3.9-3.11)
RAM: Minimum 8GB (Recommended: 16GB+)
Storage: 5GB+ for models and cache
Ollama: For local LLM inference

🚀 Quick Start

1️⃣ Clone and Setup

git clone https://github.com/hndrwn-dk/s3-onprem-ai-assistant.git
cd s3-onprem-ai-assistant

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2️⃣ Install Ollama + Fast Model

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the optimized model (much faster than mistral)
ollama pull phi3:mini

3️⃣ Prepare Your Documents

# Place or upload your files in docs/
docs/
├── cloudian_admin_guide.pdf
├── minio_operations_manual.pdf
├── sample_bucket_metadata_converted.txt
└── other_documents...

4️⃣ Build Optimized Index

# Build vector embeddings with optimization
python build_embeddings_all.py

5️⃣ Start Querying!

# Ultra-fast CLI (recommended for speed)
python s3ai_query.py "show all buckets under dept: engineering"

# Web UI with progress indicators
streamlit run streamlit_ui.py

# REST API
python api.py
# Visit: http://localhost:8000/docs

🧠 Advanced Search Architecture

Multi-Tier Speed Strategy

Query → Cache Check → Quick Bucket Search → Vector Search → Text Fallback
  ↓         ↓              ↓                 ↓             ↓
0.01s    0.1-0.5s       1-3s              2-5s        Last Resort

1️⃣ Cache Layer (Instant)

Response caching with TTL
Instant answers for repeated queries
Automatic cache management

2️⃣ Quick Bucket Search (0.1-0.5s)

Pre-indexed department/label searches
Pattern matching for common queries
Optimized for bucket metadata

3️⃣ Vector Search (1-3s)

FAISS-powered semantic search
Optimized chunking strategy
Reduced search parameters

4️⃣ Text Fallback (2-5s)

Direct text matching
Context-aware responses
Comprehensive coverage

💬 Example Queries

Category	Query	Expected Response Time
🔍 Bucket Lookup	`"show all buckets under dept: engineering"`	0.1-0.5s
🏷️ Label Search	`"find buckets with label: backup"`	0.1-0.5s
🔧 Troubleshooting	`"How to fix S3 error code 403?"`	1-3s
📊 Metadata	`"Show requestor_email for Bucket-001"`	0.1-0.5s
🛠️ Admin Tasks	`"How to purge versioned bucket in Cloudian?"`	1-3s
🔄 Cached Queries	Any previously asked question	0.01s

🔌 API Usage

REST API Examples

# Health check
curl -X GET "http://localhost:8000/health"

# Query with performance metrics
curl -X POST "http://localhost:8000/ask" \
  -H "Content-Type: application/json" \
  -d '{"question": "show all buckets under dept: engineering"}'

# Response includes timing
{
  "answer": "Here are the buckets under dept: engineering...",
  "source": "quick_search",
  "response_time": 0.23
}

Postman Collection

{
  "info": {
    "name": "S3 On-Prem AI Assistant v2.3.0 - Security & Deployability",
    "description": "Lightning-fast queries with performance metrics"
  },
  "item": [
    {
      "name": "Fast Department Search",
      "request": {
        "method": "POST",
        "header": [{"key": "Content-Type", "value": "application/json"}],
        "body": {
          "mode": "raw",
          "raw": "{\"question\": \"show all buckets under dept: engineering\"}"
        },
        "url": "http://localhost:8000/ask"
      }
    },
    {
      "name": "Performance Health Check",
      "request": {
        "method": "GET",
        "url": "http://localhost:8000/health"
      }
    }
  ]
}

🔧 Configuration

Performance Settings (`config.py`)

# Speed optimizations
VECTOR_SEARCH_K = 3      # Reduced from 5 for speed
CHUNK_SIZE = 800         # Optimized chunk size
CHUNK_OVERLAP = 100      # Optimized overlap
CACHE_TTL_HOURS = 24     # Response cache TTL

Model Configuration

# Fast model settings
MODEL = "phi3:mini"      # Much faster than mistral
TEMPERATURE = 0.3        # Balanced creativity/speed
NUM_PREDICT = 512        # Limit response length

📋 Troubleshooting

Common Issues

Slow first query: Models loading (normal, cached afterward)
No phi3:mini model: Run ollama pull phi3:mini
Empty responses: Rebuild embeddings with python build_embeddings_all.py
Memory issues: Reduce CHUNK_SIZE in config.py

Performance Optimization

# Clear cache for fresh start
rm -rf cache/

# Rebuild optimized embeddings
python build_embeddings_all.py

# Check model cache status
python -c "from model_cache import ModelCache; print(ModelCache.get_load_times())"

🧪 Tested Environment

✅ Python 3.8-3.11
✅ Ubuntu 20.04+ / CentOS 7+ / Windows 10+
✅ LangChain 0.1+
✅ Streamlit 1.29+
✅ FastAPI 0.108+
✅ Ollama (PHI3-mini)

📘 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

We welcome contributions! Areas of interest:

🏢 Additional S3 vendor support
⚡ Performance optimizations
🧠 New AI models
📊 Enhanced monitoring
🔍 Better search algorithms

How to Contribute

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

🔒 Security

Set an API key by exporting API_KEY or in docker-compose.yml. Protected endpoints require header X-API-Key: <value>.
Configure allowed CORS origins via CORS_ORIGINS (comma-separated), default *.
Note: FAISS loading uses a configurable flag ALLOW_DANGEROUS_DESERIALIZATION (default true). Keep indices in a trusted location.

🐳 Docker Quickstart

# Build and run API + Ollama
docker compose up --build -d

# Pull model in Ollama container (first run)
docker exec -it $(docker ps -q -f name=ollama) ollama pull phi3:mini

# Call the API
curl -H "X-API-Key: $API_KEY" -X POST localhost:8000/ask \
  -H 'Content-Type: application/json' \
  -d '{"question": "show all buckets under dept: engineering"}'

⚡ Built for Speed • 🔒 Privacy-First • 🚀 Production-Ready

Made with ❤️ for the S3 storage community

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
deployment		deployment
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.py		api.py
bucket_index.py		bucket_index.py
build_embeddings_all.py		build_embeddings_all.py
clear_cache.py		clear_cache.py
config.py		config.py
fast_pdf_search.py		fast_pdf_search.py
fast_search.py		fast_search.py
model_cache.py		model_cache.py
recent_questions.txt		recent_questions.txt
requirements-minimal.txt		requirements-minimal.txt
requirements.txt		requirements.txt
response_cache.py		response_cache.py
s3ai_query.py		s3ai_query.py
streamlit_ui.py		streamlit_ui.py
utils.py		utils.py
validation.py		validation.py

License

hndrwn-dk/s3-onprem-ai-assistant

Folders and files

Latest commit

History

Repository files navigation