A lightning-fast, fully offline-capable AI assistant for answering operational, admin, and troubleshooting questions for on-premises S3-compatible platforms. 15-60x faster than typical implementations with advanced caching and optimization.
If you find this project helpful, you can support me here:
- 📦 Cloudian HyperStore
- 🧊 Huawei OceanStor
- ⚡ Pure FlashBlade
- 🏢 IBM Cloud Object Storage
- 🐳 MinIO
- 🌐 Dell ECS
- 📊 NetApp StorageGRID
- And more S3-compatible storage systems
- 🚀 Lightning Fast: Subsecond responses with multi-tier caching
- 🧠 Smart Search: Vector + pre-indexed + fallback search
- 📱 Multi-Interface: CLI, Web UI, and REST API
- 🔒 Fully Offline: No cloud dependency, runs entirely on-premises
- 📊 Performance Monitoring: Built-in timing and metrics
- 🎯 Intelligent Fallbacks: Progressive search strategy
- 💾 Response Caching: Instant answers for repeated queries
- 🔍 Document Types: PDF, TXT, JSON, MD support
| Query Type | Before | After | Improvement |
|---|---|---|---|
| Cached queries | 3-5s | 0.01s | 300-500x faster |
| Bucket dept queries | 5-10s | 0.1-0.5s | 20-100x faster |
| Vector search | 5-15s | 1-3s | 5-10x faster |
| Model loading | 5-10s | 2-5s | 2-3x faster |
Overall: 15-60x faster responses! 🚀
s3_onprem_ai_assistant/
├── 🔧 Core Files
│ ├── config.py # Optimized settings
│ ├── model_cache.py # Model caching system
│ ├── response_cache.py # Response caching
│ ├── bucket_index.py # Fast bucket search
│ └── utils.py # Optimized utilities
├── 🏗️ Build & Processing
│ └── build_embeddings_all.py # Optimized embedding builder
├── 🖥️ User Interfaces
│ ├── s3ai_query.py # Ultra-fast CLI
│ ├── api.py # Lightning-fast API
│ └── streamlit_ui.py # Ultra-fast Web UI
├── 📄 Documentation
│ └── docs/ # Place your files here
│ ├── *.pdf # Admin guides
│ ├── *.json # Metadata files
│ ├── *.txt # Text documents
│ └── *.md # Markdown files
└── 🗂️ Auto-generated
├── cache/ # Response cache
├── s3_all_docs/ # Vector store
└── s3_all_chunks.pkl # Document chunks- Python: 3.8+ (Recommended: 3.9-3.11)
- RAM: Minimum 8GB (Recommended: 16GB+)
- Storage: 5GB+ for models and cache
- Ollama: For local LLM inference
git clone https://github.com/hndrwn-dk/s3-onprem-ai-assistant.git
cd s3-onprem-ai-assistant
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the optimized model (much faster than mistral)
ollama pull phi3:mini# Place or upload your files in docs/
docs/
├── cloudian_admin_guide.pdf
├── minio_operations_manual.pdf
├── sample_bucket_metadata_converted.txt
└── other_documents...# Build vector embeddings with optimization
python build_embeddings_all.py# Ultra-fast CLI (recommended for speed)
python s3ai_query.py "show all buckets under dept: engineering"
# Web UI with progress indicators
streamlit run streamlit_ui.py
# REST API
python api.py
# Visit: http://localhost:8000/docsQuery → Cache Check → Quick Bucket Search → Vector Search → Text Fallback
↓ ↓ ↓ ↓ ↓
0.01s 0.1-0.5s 1-3s 2-5s Last Resort
- Response caching with TTL
- Instant answers for repeated queries
- Automatic cache management
- Pre-indexed department/label searches
- Pattern matching for common queries
- Optimized for bucket metadata
- FAISS-powered semantic search
- Optimized chunking strategy
- Reduced search parameters
- Direct text matching
- Context-aware responses
- Comprehensive coverage
| Category | Query | Expected Response Time |
|---|---|---|
| 🔍 Bucket Lookup | "show all buckets under dept: engineering" |
0.1-0.5s |
| 🏷️ Label Search | "find buckets with label: backup" |
0.1-0.5s |
| 🔧 Troubleshooting | "How to fix S3 error code 403?" |
1-3s |
| 📊 Metadata | "Show requestor_email for Bucket-001" |
0.1-0.5s |
| 🛠️ Admin Tasks | "How to purge versioned bucket in Cloudian?" |
1-3s |
| 🔄 Cached Queries | Any previously asked question | 0.01s |
# Health check
curl -X GET "http://localhost:8000/health"
# Query with performance metrics
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "show all buckets under dept: engineering"}'
# Response includes timing
{
"answer": "Here are the buckets under dept: engineering...",
"source": "quick_search",
"response_time": 0.23
}{
"info": {
"name": "S3 On-Prem AI Assistant v2.3.0 - Security & Deployability",
"description": "Lightning-fast queries with performance metrics"
},
"item": [
{
"name": "Fast Department Search",
"request": {
"method": "POST",
"header": [{"key": "Content-Type", "value": "application/json"}],
"body": {
"mode": "raw",
"raw": "{\"question\": \"show all buckets under dept: engineering\"}"
},
"url": "http://localhost:8000/ask"
}
},
{
"name": "Performance Health Check",
"request": {
"method": "GET",
"url": "http://localhost:8000/health"
}
}
]
}# Speed optimizations
VECTOR_SEARCH_K = 3 # Reduced from 5 for speed
CHUNK_SIZE = 800 # Optimized chunk size
CHUNK_OVERLAP = 100 # Optimized overlap
CACHE_TTL_HOURS = 24 # Response cache TTL# Fast model settings
MODEL = "phi3:mini" # Much faster than mistral
TEMPERATURE = 0.3 # Balanced creativity/speed
NUM_PREDICT = 512 # Limit response length- Slow first query: Models loading (normal, cached afterward)
- No phi3:mini model: Run
ollama pull phi3:mini - Empty responses: Rebuild embeddings with
python build_embeddings_all.py - Memory issues: Reduce
CHUNK_SIZEin config.py
# Clear cache for fresh start
rm -rf cache/
# Rebuild optimized embeddings
python build_embeddings_all.py
# Check model cache status
python -c "from model_cache import ModelCache; print(ModelCache.get_load_times())"- ✅ Python 3.8-3.11
- ✅ Ubuntu 20.04+ / CentOS 7+ / Windows 10+
- ✅ LangChain 0.1+
- ✅ Streamlit 1.29+
- ✅ FastAPI 0.108+
- ✅ Ollama (PHI3-mini)
This project is licensed under the MIT License - see the LICENSE file for details.
We welcome contributions! Areas of interest:
- 🏢 Additional S3 vendor support
- ⚡ Performance optimizations
- 🧠 New AI models
- 📊 Enhanced monitoring
- 🔍 Better search algorithms
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Set an API key by exporting
API_KEYor indocker-compose.yml. Protected endpoints require headerX-API-Key: <value>. - Configure allowed CORS origins via
CORS_ORIGINS(comma-separated), default*. - Note: FAISS loading uses a configurable flag
ALLOW_DANGEROUS_DESERIALIZATION(default true). Keep indices in a trusted location.
# Build and run API + Ollama
docker compose up --build -d
# Pull model in Ollama container (first run)
docker exec -it $(docker ps -q -f name=ollama) ollama pull phi3:mini
# Call the API
curl -H "X-API-Key: $API_KEY" -X POST localhost:8000/ask \
-H 'Content-Type: application/json' \
-d '{"question": "show all buckets under dept: engineering"}'⚡ Built for Speed • 🔒 Privacy-First • 🚀 Production-Ready
Made with ❤️ for the S3 storage community