Search everything. Act on anything. Build smarter AI. A RAG stack that helps organizations unlock the value of their internal knowledge by turning search into a springboard for smarter automation—through chat, agent workflows, and direct access.
Visit our website at https://insightmesh.koderex.dev/ for more information about the project.
🧠 OpenWebUI -> The chat UI because why not!
🔄 LiteLLM Proxy -> Monitor, Observe and Manage LLMOps centrally - make use of LangFuse to handle prompt management
📊 Dagster -> ETL and scheduling engine for data ingestion and pipeline orchestration
📚 RAG Pipeline Python (your code) -> Custom RAG injection code loaded dynamically like a plugin (Inject company data, do auth checks, add guardrails to make it safe and prod ready)
🔍 Elasticsearch & Neo4j -> Data and agents layer for building powerful search and retrieval systems
🛡️ Caddy -> Auth Proxy to allow OpenWebUI and LiteLLM to centralize auth
🤖 Slack Bot -> AI assistant that connects to your Slack workspace, enabling users to query data and trigger agent processes directly from Slack
All you need to do is build the data pipelines to ingest and index
- Git
- Docker and Docker Compose
- OpenAI API key
-
Clone the repository with submodules:
git clone --recurse-submodules https://github.com/yourusername/insight-mesh.git cd insight-mesh
If you've already cloned without submodules:
git submodule init git submodule update
-
Configure your environment variables:
cp env.example .env
Then edit the
.env
file to add your OpenAI API key, Google OAuth credentials, and other configuration settings.Alternatively, use the setup script to configure environment variables:
chmod +x setup-env.sh ./setup-env.sh
-
Set up Google authentication if needed:
chmod +x setup-google-auth.sh ./setup-google-auth.sh
-
Start the services:
docker-compose up -d
-
Access the UI at http://localhost:8080 (secured with Google authentication)
All configuration is managed through environment variables in the .env
file. Key variables include:
- API Keys:
OPENAI_API_KEY
,GOOGLE_API_KEY
,SERPER_API_KEY
- Database:
POSTGRES_USER
,POSTGRES_PASSWORD
,POSTGRES_DB
- Neo4j:
NEO4J_USER
,NEO4J_PASSWORD
- OAuth:
GOOGLE_CLIENT_ID
,GOOGLE_CLIENT_SECRET
- Slack Integration:
SLACK_BOT_TOKEN
,SLACK_APP_TOKEN
- Model Configuration:
DEFAULT_MODEL
,LLM_MODEL
For a complete list of available configuration options, refer to the env.example
file.
- LiteLLM configuration is in
config/litellm_config.yaml
- By default, the system uses GPT-4 and GPT-4o models
- Caddy configuration for Google authentication is in
config/caddy/Caddyfile
To set up Google OAuth for authentication:
- Visit the Google Cloud Console to create OAuth credentials
- Create an OAuth 2.0 Client ID (Web application type)
- Add the following as an authorized redirect URI:
http://localhost:8080/auth/oauth2/google/authorization-code-callback
- Copy your Client ID and Client Secret to the
.env
file or use the setup script - By default, only users with email addresses from the specified domain (default: gmail.com) will be allowed access
To integrate with Slack:
- Create a Slack app at https://api.slack.com/apps
- Configure your app with the required permissions (detailed in
slack-bot/SETUP.md
):- Essential scopes:
app_mentions:read
,chat:write
,im:history
,mpim:history
,groups:history
,channels:history
- Event subscriptions:
app_mention
,message.im
,message.mpim
,message.groups
,message.channels
,message
- Essential scopes:
- Add your Slack credentials to the
.env
file:SLACK_BOT_TOKEN=xoxb-your-bot-token SLACK_APP_TOKEN=xapp-your-app-token
- Restart the Slack bot or entire stack after configuration changes:
docker-compose restart slack-bot
- For complete setup instructions, see
slack-bot/SETUP.md
Variable | Description | Default |
---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | - |
GOOGLE_CLIENT_ID |
Google OAuth Client ID | - |
GOOGLE_CLIENT_SECRET |
Google OAuth Client Secret | - |
ALLOWED_EMAIL_DOMAIN |
Email domain allowed for authentication | * |
POSTGRES_USER |
PostgreSQL username | postgres |
POSTGRES_PASSWORD |
PostgreSQL password | postgres |
POSTGRES_DB |
PostgreSQL database name | litellm |
NEO4J_USER |
Neo4j username | neo4j |
NEO4J_PASSWORD |
Neo4j password | password |
SLACK_BOT_TOKEN |
Slack Bot User OAuth Token | - |
SLACK_APP_TOKEN |
Slack App-Level Token | - |
SLACK_BOT_ID |
Slack Bot User ID | - |
LLM_MODEL |
Default model for Slack bot | gpt-4o-mini |
DEFAULT_MODEL |
Default model for OpenWebUI | gpt-4o |
All components have test suites that ensure functionality and reliability. Tests are run automatically on GitHub Actions for every pull request and push to the main branch.
A unified test script is provided to run all component tests with the same configuration as the GitHub Actions workflow:
./run_all_tests.sh
This script:
- Creates Python 3.11 virtual environments for each component
- Installs required dependencies
- Runs tests with the same parameters as the CI pipeline
- Displays test results with coverage reports
You can also run tests for individual components:
-
MCP Server:
cd mcp-server python -m pytest test_mcp_mocked.py test_fastmcp.py tests/
-
RAG Pipeline:
cd rag_pipeline python -m pytest test_rag_handler.py
-
Dagster:
cd dagster_project python -m pytest test_assets.py test_web_assets.py
-
Slack Bot:
cd slack-bot # Set environment variables SLACK_BOT_TOKEN="xoxb-test-token" \ SLACK_APP_TOKEN="xapp-test-token" \ LLM_API_URL="http://localhost:8000" \ LLM_API_KEY="test-key" \ python -m pytest tests/ --cov=.
- Python 3.11
- pytest, pytest-asyncio, pytest-cov
- Component-specific dependencies (installed automatically by the test script)
To customize the RAG pipeline, you can:
- Configure Dagster assets for your ETL workflows and scheduling needs
- Modify the Elasticsearch and Neo4j configurations for your specific data sources
- Create custom retrieval pipelines in the
rag_pipeline
directory - Connect your data sources to the system (currently supports Google Drive, with more integrations coming soon)
- Supports indexing of Google Docs, Sheets, and Slides
- Maintains folder hierarchy in Neo4j
- Handles permissions and access control
- Automatic content updates through scheduled indexing
MIT License - See LICENSE for details