An offline-ready, modular RAG (Retrieval-Augmented Generation) platform that enables clinicians to interact with a local GeNotes database using open-source LLMs (Mistral or LLaMA). Built with FastAPI, React.js, and LLAMAIndex, and deployable in local or cloud environments.
Genomic notes for clinicians
Quick, concise information to help healthcare professionals make the right genomic decisions at each stage of a clinical pathway. GeNotes Website
GeNotes aims to improve the accessibility and usability of genomic clinical guidelines for clinicians. This chatbot allows healthcare professionals to efficiently retrieve relevant genomic clinical guidelines from GeNotes' dataset.
- Interactive Chat Interface: Natural language interface for querying genomic guidelines
- Document Management: Upload and manage clinical documents and guidelines
- Local LLM Processing: Uses Ollama for privacy-focused local language model processing (Mistral/LLaMA)
- Vector Store Integration: ChromaDB for efficient storage and retrieval of genomic knowledge
- RAG Architecture: Retrieval-Augmented Generation for accurate, context-aware responses
- Containerized Deployment: Easy setup with Docker and Docker Compose
- WebSocket Support: Real-time chat capabilities
- Session Management: In-memory session handling for chat conversations
- Docker 20.10.0+
- Docker Compose 2.0.0+
- 8GB+ RAM (16GB recommended for optimal LLM performance)
- 10GB+ free disk space (for vector store and models)
- Ollama installed locally (for local LLM processing)
- Python 3.11+ (for development)
-
Clone the Repository
git clone https://github.com/yourusername/GeNotes_assistant.git cd GeNotes_assistant
-
Prepare Environment Copy the .env.example and edit as needed:
cp .env.example .env # Edit .env file with your configuration
-
Preprocess Data
# Preprocess CSV to index docker run --rm -v $(pwd):/app -w /app/backend/scripts python:3.10 \ bash -c "pip install -r ../requirements.txt && python preprocess.py"
-
Build and Start
docker-compose up --build -d
-
Download Required Language Models After the services are running, you'll need to download the required language models. Run the following command:
chmod +x download_models.sh # Only needed once ./download_models.sh
This will download the following models:
nomic-embed-text
(for embeddings)mxbai-embed-large
(for embeddings)llama3.2:3b
(for chat completions)
Note: The download may take 10-30 minutes depending on your internet connection. The models will be stored in a Docker volume for persistence.
-
Verify the services Check that all containers are running:
docker-compose ps
You should see all services with a status of "healthy" or "up".
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Ollama API: http://localhost:11434
To see which models are currently available in the Ollama container:
docker-compose exec ollama ollama list
# Or using the API
curl http://localhost:11434/api/tags
To download additional models:
docker-compose exec ollama ollama pull <model-name>
- Embedding models:
nomic-embed-text
,mxbai-embed-large
- Chat models:
llama3:3b
,llama3:8b
,mistral:7b
After downloading models, you can verify they're working with:
# For embedding models
curl http://localhost:11434/api/embeddings -H "Content-Type: application/json" -d '{"model": "nomic-embed-text", "prompt": "test"}'
# For chat models
curl http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{"model": "llama3:3b", "prompt": "Hello"}'
If the download_models.sh
script fails:
-
Check your internet connection
-
Verify the Ollama service is running:
docker-compose ps | grep ollama
-
Check the Ollama logs:
docker-compose logs ollama
-
Try downloading models manually using the commands in the "Managing Models" section
-
Directly check which models have been downloaded:
docker exec genotes-ollama ollama list
Models can take up significant disk space (several GB each). To check disk usage:
docker system df
To clean up unused models and free space:
docker-compose exec ollama ollama rm <model-name>
If you make changes to the environment or need to restart:
docker-compose down
docker-compose up -d
# Check container status
docker-compose ps
# View logs
docker-compose logs -f
-
Set up Python environment
python -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate pip install -r requirements.txt
-
Run the backend server
cd backend uvicorn 3_chatbot:app --reload --host 0.0.0.0 --port 8000
-
Install dependencies
cd frontend npm install
-
Start the development server
npm start
GeNotes_assistant/
βββ backend/ # FastAPI backend + RAG logic
βββ frontend/ # React-based chatbot UI
βββ nginx/ # Nginx reverse proxy configuration
β βββ conf.d/ # Nginx server configurations
βββ data/ # GeNotes CSV and processed data
βββ scripts/ # Preprocessing and DB setup scripts
βββ docs/ # Deployment & architecture notes
βββ docker-compose.yml
Layer | Technology |
---|---|
Frontend | React.js, TypeScript |
Backend | FastAPI, LLAMAIndex, LangChain |
LLM | Mistral / LLaMA (local) |
Vector Store | ChromaDB |
Deployment | Docker, Docker Compose |
Create a .env
file based on .env.example
with the following variables:
# Backend
CHAT_MODEL=llama3
EMBEDDING_MODEL=nomic-embed-text
MODEL_PROVIDER=ollama
MODEL_TEMPERATURE=0.7
DATA_DIR=./data
COLLECTION_NAME=genomic_guidelines
# Ollama
OLLAMA_HOST=ollama:11434
# Application
FRONTEND_URL=http://localhost:3000
BACKEND_URL=http://localhost:8000
Once the application is running, you can access the following endpoints:
POST /api/chat
- Send a chat messageGET /api/chat/session/{session_id}
- Get chat session historyWS /ws/chat
- WebSocket endpoint for real-time chat
POST /api/scrape
- Scrape and process a websitePOST /api/upload
- Upload and process filesGET /api/collections
- List available collections
GET /
- Health checkGET /status
- System status and statistics
GET /docs
- Interactive Swagger UIGET /redoc
- ReDoc documentation
-
Set up Python environment
python -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate pip install -r requirements.txt
-
Run the backend server
cd backend uvicorn 3_chatbot:app --reload --host 0.0.0.0 --port 8000
# Run backend tests
cd backend
pytest
# Run linters
flake8 .
mypy .
black --check .
# View logs
docker-compose logs -f backend
# Access container shell
docker-compose exec backend /bin/bash
-
Set up environment variables
cp .env.production .env # Update production-specific variables
-
Build and deploy
docker-compose -f docker-compose.prod.yml up --build -d
-
Ollama connection issues
- Verify Ollama is running:
curl http://localhost:11434/api/version
- Check if the model is downloaded:
ollama list
- Verify Ollama is running:
-
Port conflicts
- Ensure ports 3000 (frontend), 8000 (backend), and 11434 (Ollama) are available
- Check with:
lsof -i :<port>
ornetstat -tuln | grep <port>
-
Docker resource limits
- Increase Docker's memory allocation in Docker Desktop settings
- Recommended: 8GB RAM, 4 CPU cores for LLM processing
-
Container health issues
- Check container logs:
docker-compose logs -f <service>
- Verify container health:
docker ps --filter "health=unhealthy"
- Check container logs:
-
Vector store issues
- Clear the vector store directory if needed:
rm -rf data/chromadb/*
- Rebuild the vector store after clearing data
- Clear the vector store directory if needed:
This project is licensed under the MIT License - see the LICENSE file for details.