Skip to content

Michael31416/genestack_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Gene-Disease Analysis Service

A web application that analyzes potential correlations between genes and diseases using public life sciences data and LLM capabilities. Built as a demonstration project showcasing modern Python development practices and AI integration.

NB: Working online version: https://gene-disease-analysis-service-production.up.railway.app/

πŸš€ Quick Start

# Clone the repository
git clone <repository-url>
cd genestack_test

# Run with Docker (recommended)
docker-compose up

# Access the application
open http://localhost:8000

The application will be fully functional after running docker-compose up.

πŸ—οΈ Architecture Overview

Technology Stack

  • Backend: FastAPI (async Python web framework)
  • Frontend: Vanilla JavaScript with modern CSS
  • Database: SQLite (lightweight, sufficient for demo)
  • LLM Integration: OpenAI and Anthropic APIs
  • Data Sources: OpenTargets, Ensembl, Europe PMC, GWAS Catalog
  • Deployment: Docker & Docker Compose

System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend  │────▢│   FastAPI    │────▢│  Public APIs    β”‚
β”‚  (JS/HTML)  │◀────│   Backend    │◀────│  (OpenTargets,  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚   Ensembl, etc) β”‚
                           β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚              β”‚
                           β–Ό              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   SQLite     β”‚     β”‚   LLM APIs      β”‚
                    β”‚   Database   β”‚     β”‚ (OpenAI/Claude) β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Features

Core Functionality

  • User Management: Session-based authentication with secure API key storage
  • Gene-Disease Analysis: Comprehensive correlation analysis using multiple data sources
  • Real-time Updates: WebSocket support for live analysis progress
  • History Tracking: Complete analysis history per user
  • Multi-LLM Support: Works with both OpenAI and Anthropic models

Data Integration

  • OpenTargets Platform: Disease-gene association scores and evidence
  • Ensembl: Gene symbol resolution and normalization
  • Europe PMC: Scientific literature evidence
  • GWAS Catalog: Genetic association studies
  • EBI OLS: Disease ontology mapping

Security Features

  • In-memory API key storage (never persisted to database)
  • Session timeout after 20 minutes of inactivity
  • Rate limiting on API endpoints
  • Secure session management

πŸ› οΈ Development Setup

Prerequisites

  • Python 3.10+
  • Docker and Docker Compose (for containerized deployment)
  • OpenAI or Anthropic API key

Local Development

# Navigate to backend directory
cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the application
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Access at http://localhost:8000

Running Tests

cd backend

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=term-missing

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

πŸ“– API Documentation

Once the application is running, access the interactive API documentation:

Key Endpoints

  • POST /api/v1/auth/login - User login with API key
  • POST /api/v1/analyses - Create new gene-disease analysis
  • GET /api/v1/analyses/{id} - Get analysis results
  • GET /api/v1/analyses - List user's analysis history
  • WS /api/v1/ws/{analysis_id} - WebSocket for real-time updates

🎯 Design Decisions & Trade-offs

1. Async Architecture (FastAPI + httpx)

  • Decision: Used async/await throughout the application
  • Rationale: Efficient handling of multiple concurrent API calls to external services
  • Trade-off: Slightly more complex code, but significantly better performance for I/O-bound operations

2. In-Memory Session Storage

  • Decision: Store API keys in memory, not in database
  • Rationale: Enhanced security - API keys are never persisted
  • Trade-off: Sessions lost on server restart, but acceptable for demo application
  • Benefit: No risk of API key exposure through database access

3. Multiple Data Source Integration

  • Decision: Integrate 4+ public APIs instead of just one
  • Rationale: Comprehensive evidence gathering for better analysis quality
  • Trade-off: Increased complexity and potential points of failure
  • Mitigation: Graceful degradation - analysis continues even if some sources fail

4. SQLite Database

  • Decision: Use SQLite instead of PostgreSQL/MySQL
  • Rationale: Simplicity for demo, no additional services needed
  • Trade-off: Not suitable for production scale
  • Note: Easy to migrate to PostgreSQL if needed (SQLAlchemy abstraction)

5. Vanilla JavaScript Frontend

  • Decision: No frontend framework (React/Vue/Angular)
  • Rationale: Simplicity, no build process needed, focuses on backend skills
  • Trade-off: Less maintainable for larger applications
  • Benefit: Zero dependencies, instant loading, easy to understand

6. Structured LLM Prompts

  • Decision: Enforce JSON schema in LLM responses
  • Rationale: Reliable parsing and consistent output format
  • Trade-off: Occasional LLM failures when strict format not followed
  • Mitigation: Fallback parsing strategies implemented

7. 20-Minute Session Timeout

  • Decision: Short session timeout based on inactivity
  • Rationale: Security best practice for API key protection
  • Trade-off: Users need to re-login more frequently
  • Benefit: Reduced risk of session hijacking

🚧 Limitations & Production Considerations

This is a demonstration application built in limited time. For production use, consider:

  1. Database: Migrate to PostgreSQL/MySQL for better concurrency
  2. Caching: Add Redis for session storage and API response caching
  3. Authentication: Implement proper OAuth2/JWT authentication
  4. Monitoring: Add logging, metrics, and error tracking (Sentry, Datadog)
  5. API Keys: Use key management service (AWS KMS, HashiCorp Vault)
  6. Rate Limiting: Implement per-user rate limiting with Redis
  7. Frontend: Consider React/Vue for better state management
  8. Testing: Expand test coverage (currently ~80%)
  9. CI/CD: Add GitHub Actions for automated testing and deployment
  10. Documentation: Add API versioning and deprecation policies

🌟 Bonus Features Implemented

Beyond the basic requirements, this implementation includes:

  1. WebSocket Support: Real-time analysis progress updates
  2. Advanced Data Integration: 4+ data sources vs required 1
  3. Comprehensive Error Handling: Graceful degradation for all external services
  4. Smart LLM Context: System prevents redundant data source recommendations
  5. Literature Evidence Display: Direct links to scientific papers
  6. Session Security: In-memory storage with automatic expiration
  7. Test Suite: 73 tests covering unit and integration scenarios
  8. CLI Tool: Standalone command-line interface included

πŸ“Š Performance Characteristics

  • Concurrent Requests: Handles 100+ simultaneous analyses
  • Response Time: <2s for cached data, 5-15s for full analysis
  • Memory Usage: ~200MB baseline, +5MB per active session
  • Database Size: ~10KB per analysis (efficient JSON storage)

πŸ“„ License

This project is provided as-is for demonstration purposes.

πŸ™ Acknowledgments

  • OpenTargets Platform for comprehensive gene-disease associations
  • EMBL-EBI for providing public bioinformatics APIs
  • OpenAI and Anthropic for LLM capabilities
  • FastAPI for the excellent async web framework

About

Gene-Disease Analysis Service Demo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •