A web application that analyzes potential correlations between genes and diseases using public life sciences data and LLM capabilities. Built as a demonstration project showcasing modern Python development practices and AI integration.
NB: Working online version: https://gene-disease-analysis-service-production.up.railway.app/
# Clone the repository
git clone <repository-url>
cd genestack_test
# Run with Docker (recommended)
docker-compose up
# Access the application
open http://localhost:8000The application will be fully functional after running docker-compose up.
- Backend: FastAPI (async Python web framework)
- Frontend: Vanilla JavaScript with modern CSS
- Database: SQLite (lightweight, sufficient for demo)
- LLM Integration: OpenAI and Anthropic APIs
- Data Sources: OpenTargets, Ensembl, Europe PMC, GWAS Catalog
- Deployment: Docker & Docker Compose
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Frontend ββββββΆβ FastAPI ββββββΆβ Public APIs β
β (JS/HTML) βββββββ Backend βββββββ (OpenTargets, β
βββββββββββββββ ββββββββββββββββ β Ensembl, etc) β
β βββββββββββββββββββ
β
ββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ βββββββββββββββββββ
β SQLite β β LLM APIs β
β Database β β (OpenAI/Claude) β
ββββββββββββββββ βββββββββββββββββββ
- User Management: Session-based authentication with secure API key storage
- Gene-Disease Analysis: Comprehensive correlation analysis using multiple data sources
- Real-time Updates: WebSocket support for live analysis progress
- History Tracking: Complete analysis history per user
- Multi-LLM Support: Works with both OpenAI and Anthropic models
- OpenTargets Platform: Disease-gene association scores and evidence
- Ensembl: Gene symbol resolution and normalization
- Europe PMC: Scientific literature evidence
- GWAS Catalog: Genetic association studies
- EBI OLS: Disease ontology mapping
- In-memory API key storage (never persisted to database)
- Session timeout after 20 minutes of inactivity
- Rate limiting on API endpoints
- Secure session management
- Python 3.10+
- Docker and Docker Compose (for containerized deployment)
- OpenAI or Anthropic API key
# Navigate to backend directory
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the application
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Access at http://localhost:8000cd backend
# Run all tests
pytest
# Run with coverage
pytest --cov=app --cov-report=term-missing
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests onlyOnce the application is running, access the interactive API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
POST /api/v1/auth/login- User login with API keyPOST /api/v1/analyses- Create new gene-disease analysisGET /api/v1/analyses/{id}- Get analysis resultsGET /api/v1/analyses- List user's analysis historyWS /api/v1/ws/{analysis_id}- WebSocket for real-time updates
- Decision: Used async/await throughout the application
- Rationale: Efficient handling of multiple concurrent API calls to external services
- Trade-off: Slightly more complex code, but significantly better performance for I/O-bound operations
- Decision: Store API keys in memory, not in database
- Rationale: Enhanced security - API keys are never persisted
- Trade-off: Sessions lost on server restart, but acceptable for demo application
- Benefit: No risk of API key exposure through database access
- Decision: Integrate 4+ public APIs instead of just one
- Rationale: Comprehensive evidence gathering for better analysis quality
- Trade-off: Increased complexity and potential points of failure
- Mitigation: Graceful degradation - analysis continues even if some sources fail
- Decision: Use SQLite instead of PostgreSQL/MySQL
- Rationale: Simplicity for demo, no additional services needed
- Trade-off: Not suitable for production scale
- Note: Easy to migrate to PostgreSQL if needed (SQLAlchemy abstraction)
- Decision: No frontend framework (React/Vue/Angular)
- Rationale: Simplicity, no build process needed, focuses on backend skills
- Trade-off: Less maintainable for larger applications
- Benefit: Zero dependencies, instant loading, easy to understand
- Decision: Enforce JSON schema in LLM responses
- Rationale: Reliable parsing and consistent output format
- Trade-off: Occasional LLM failures when strict format not followed
- Mitigation: Fallback parsing strategies implemented
- Decision: Short session timeout based on inactivity
- Rationale: Security best practice for API key protection
- Trade-off: Users need to re-login more frequently
- Benefit: Reduced risk of session hijacking
This is a demonstration application built in limited time. For production use, consider:
- Database: Migrate to PostgreSQL/MySQL for better concurrency
- Caching: Add Redis for session storage and API response caching
- Authentication: Implement proper OAuth2/JWT authentication
- Monitoring: Add logging, metrics, and error tracking (Sentry, Datadog)
- API Keys: Use key management service (AWS KMS, HashiCorp Vault)
- Rate Limiting: Implement per-user rate limiting with Redis
- Frontend: Consider React/Vue for better state management
- Testing: Expand test coverage (currently ~80%)
- CI/CD: Add GitHub Actions for automated testing and deployment
- Documentation: Add API versioning and deprecation policies
Beyond the basic requirements, this implementation includes:
- WebSocket Support: Real-time analysis progress updates
- Advanced Data Integration: 4+ data sources vs required 1
- Comprehensive Error Handling: Graceful degradation for all external services
- Smart LLM Context: System prevents redundant data source recommendations
- Literature Evidence Display: Direct links to scientific papers
- Session Security: In-memory storage with automatic expiration
- Test Suite: 73 tests covering unit and integration scenarios
- CLI Tool: Standalone command-line interface included
- Concurrent Requests: Handles 100+ simultaneous analyses
- Response Time: <2s for cached data, 5-15s for full analysis
- Memory Usage: ~200MB baseline, +5MB per active session
- Database Size: ~10KB per analysis (efficient JSON storage)
This project is provided as-is for demonstration purposes.
- OpenTargets Platform for comprehensive gene-disease associations
- EMBL-EBI for providing public bioinformatics APIs
- OpenAI and Anthropic for LLM capabilities
- FastAPI for the excellent async web framework