Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ jobs:
ln -s $(which uv) ~/.local/bin/uv
- run: uv venv
- run: uv sync --dev
- run: uv add pytest==7.4.3
- uses: tox-dev/action-pre-commit-uv@v1
with:
extra_args: --all-files
231 changes: 182 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ Transform medical data analysis with AI! Ask questions about MIMIC-IV data in pl

## Features

- **Natural Language Queries**: Ask questions about MIMIC-IV data in plain English
- **Local SQLite**: Fast queries on demo database (free, no setup)
- **BigQuery Support**: Access full MIMIC-IV dataset on Google Cloud
- **Enterprise Security**: OAuth2 authentication with JWT tokens and rate limiting
- **SQL Injection Protection**: Read-only queries with comprehensive validation
- 🔍 **Natural Language Queries**: Ask questions about MIMIC-IV data in plain English
- 🏠 **Local DuckDB + Parquet**: Fast local queries for demo and full dataset using Parquet files with DuckDB views
- ☁️ **BigQuery Support**: Access full MIMIC-IV dataset on Google Cloud
- 🔒 **Enterprise Security**: OAuth2 authentication with JWT tokens and rate limiting
- 🛡️ **SQL Injection Protection**: Read-only queries with comprehensive validation

## 🚀 Quick Start

Expand All @@ -47,7 +47,7 @@ uv --version

### BigQuery Setup (Optional - Full Dataset)

**Skip this if using SQLite demo database.**
**Skip this if using DuckDB demo database.**

1. **Install Google Cloud SDK:**
- macOS: `brew install google-cloud-sdk`
Expand All @@ -59,35 +59,32 @@ uv --version
```
*Opens your browser - choose the Google account with BigQuery access to MIMIC-IV.*

### MCP Client Configuration

Paste one of the following into your MCP client config, then restart your client.
### M3 Initialization

**Supported clients:** [Claude Desktop](https://www.claude.com/download), [Cursor](https://cursor.com/download), [Goose](https://block.github.io/goose/), and [more](https://github.com/punkpeye/awesome-mcp-clients).

<table>
<tr>
<td width="50%">

**SQLite (Demo Database)**
**DuckDB (Demo or Full Dataset)**

Free, local, no setup required.

```json
{
"mcpServers": {
"m3": {
"command": "uvx",
"args": ["m3-mcp"],
"env": {
"M3_BACKEND": "sqlite"
}
}
}
}
To create a m3 directory and navigate into it run:
```shell
mkdir m3 && cd m3
```
If you want to use the full dataset, download it manually from [PhysioNet](https://physionet.org/content/mimiciv/3.1/) and place it into `m3/m3_data/raw`. For using the demo set you can continue and run:

```shell
uv init && uv add m3-mcp && \
uv run m3 init DATASET_NAME && uv run m3 config --quick
```
Replace `DATASET_NAME` with `mimic-iv-demo` or `mimic-iv-full` and copy & paste the output of this command into your client config JSON file.

*Demo database (136MB, 100 patients, 275 admissions) downloads automatically on first query.*
*Demo dataset (16MB raw download size) downloads automatically on first query.*

*Full dataset (10.6GB raw download size) needs to be downloaded manually.*

</td>
<td width="50%">
Expand All @@ -96,6 +93,8 @@ Free, local, no setup required.

Requires GCP credentials and PhysioNet access.

Paste this into your client config JSON file:

```json
{
"mcpServers": {
Expand Down Expand Up @@ -126,13 +125,13 @@ Requires GCP credentials and PhysioNet access.

## Backend Comparison

| Feature | SQLite (Demo) | BigQuery (Full) |
|---------|---------------|-----------------|
| **Cost** | Free | BigQuery usage fees |
| **Setup** | Zero config | GCP credentials required |
| **Data Size** | 100 patients, 275 admissions | 365k patients, 546k admissions |
| **Speed** | Fast (local) | Network latency |
| **Use Case** | Learning, development | Research, production |
| Feature | DuckDB (Demo) | DuckDB (Full) | BigQuery (Full) |
|---------|---------------|---------------|-----------------|
| **Cost** | Free | Free | BigQuery usage fees |
| **Setup** | Zero config | Manual Download | GCP credentials required |
| **Data Size** | 100 patients, 275 admissions | 365k patients, 546k admissions | 365k patients, 546k admissions |
| **Speed** | Fast (local) | Fast (local) | Network latency |
| **Use Case** | Learning, development | Research (local) | Research, production |

---

Expand All @@ -146,7 +145,7 @@ Requires GCP credentials and PhysioNet access.
<tr>
<td width="50%">

**SQLite:**
**DuckDB (Local):**
```bash
git clone https://github.com/rafiattrach/m3.git && cd m3
docker build -t m3:lite --target lite .
Expand Down Expand Up @@ -205,7 +204,7 @@ pip install m3-mcp
"m3": {
"command": "m3-mcp-server",
"env": {
"M3_BACKEND": "sqlite"
"M3_BACKEND": "duckdb"
}
}
}
Expand Down Expand Up @@ -233,14 +232,146 @@ pre-commit install
"args": ["-m", "m3.mcp_server"],
"cwd": "/path/to/m3",
"env": {
"M3_BACKEND": "sqlite"
"M3_BACKEND": "duckdb"
}
}
}
}
```

## Advanced Configuration
#### Using `UV` (Recommended)
Assuming you have [UV](https://docs.astral.sh/uv/getting-started/installation/) installed.

**Step 1: Clone and Navigate**
```bash
# Clone the repository
git clone https://github.com/rafiattrach/m3.git
cd m3
```

**Step 2: Create `UV` Virtual Environment**
```bash
# Create virtual environment
uv venv
```

**Step 3: Install M3**
```bash
uv sync
# Do not forget to use `uv run` to any subsequent commands to ensure you're using the `uv` virtual environment
```

### 🗄️ Database Configuration

After installation, choose your data source:

#### Option A: Local Demo (DuckDB + Parquet)

**Perfect for learning and development - completely free!**

1. **Initialize demo dataset**:
```bash
m3 init mimic-iv-demo
```

2. **Setup MCP Client**:
```bash
m3 config
```

*Alternative: For Claude Desktop specifically:*
```bash
m3 config claude --backend duckdb --db-path /Users/you/path/to/m3_data/databases/mimic_iv_demo.duckdb
```

5. **Restart your MCP client** and ask:

- "What tools do you have for MIMIC-IV data?"
- "Show me patient demographics from the ICU"

#### Option B: Local Full Dataset (DuckDB + Parquet)

**Run the entire MIMIC-IV dataset locally with DuckDB views over Parquet.**

1. **Acquire CSVs** (requires PhysioNet credentials):
- Download the official MIMIC-IV CSVs from PhysioNet and place them under:
- `/Users/you/path/to/m3/m3_data/raw_files/mimic-iv-full/hosp/`
- `/Users/you/path/to/m3/m3_data/raw_files/mimic-iv-full/icu/`
- Note: `m3 init`'s auto-download function currently only supports the demo dataset. Use your browser or `wget` to obtain the full dataset.

2. **Initialize full dataset**:
```bash
m3 init mimic-iv-full
```
- This may take up to 30 minutes, depending on your system (e.g. 10 minutes for MacBook Pro M3)
- Performance knobs (optional):
```bash
export M3_CONVERT_MAX_WORKERS=6 # number of parallel files (default=4)
export M3_DUCKDB_MEM=4GB # DuckDB memory limit per worker (default=3GB)
export M3_DUCKDB_THREADS=4 # DuckDB threads per worker (default=2)
```
Pay attention to your system specifications, especially if you have enough memory.

3. **Select dataset and verify**:
```bash
m3 use full # optional, as this automatically got set to full
m3 status
```
- Status prints active dataset, local DB path, Parquet presence, quick row counts and total Parquet size.

4. **Configure MCP client** (uses the full local DB):
```bash
m3 config
# or
m3 config claude --backend duckdb --db-path /Users/you/path/to/m3/m3_data/databases/mimic_iv_full.duckdb
```

#### Option C: BigQuery (Full Dataset)

**For researchers needing complete MIMIC-IV data**

##### Prerequisites
- Google Cloud account and project with billing enabled
- Access to MIMIC-IV on BigQuery (requires PhysioNet credentialing)

##### Setup Steps

1. **Install Google Cloud CLI**:

**macOS (with Homebrew):**
```bash
brew install google-cloud-sdk
```

**Windows:** Download from https://cloud.google.com/sdk/docs/install

**Linux:**
```bash
curl https://sdk.cloud.google.com | bash
```

2. **Authenticate**:
```bash
gcloud auth application-default login
```
*This will open your browser - choose the Google account that has access to your BigQuery project with MIMIC-IV data.*

3. **Setup MCP Client for BigQuery**:
```bash
m3 config
```

*Alternative: For Claude Desktop specifically:*
```bash
m3 config claude --backend bigquery --project-id YOUR_PROJECT_ID
```

4. **Test BigQuery Access** - Restart your MCP client and ask:
```
Use the get_race_distribution function to show me the top 5 races in MIMIC-IV admissions.
```

## 🔧 Advanced Configuration

Need to configure other MCP clients or customize settings? Use these commands:

Expand All @@ -255,8 +386,8 @@ Generates configuration for any MCP client with step-by-step guidance.
# Quick universal config with defaults
m3 config --quick

# Universal config with custom database
m3 config --quick --backend sqlite --db-path /path/to/database.db
# Universal config with custom DuckDB database
m3 config --quick --backend duckdb --db-path /path/to/database.duckdb

# Save config to file for other MCP clients
m3 config --output my_config.json
Expand Down Expand Up @@ -291,7 +422,7 @@ m3 config # Choose OAuth2 option during setup

---

## Available MCP Tools
## 🛠️ Available MCP Tools

When your MCP client processes questions, it uses these tools automatically:

Expand Down Expand Up @@ -323,15 +454,17 @@ Try asking your MCP client these questions:
- `Prompt:` *What tables are available in the database?*
- `Prompt:` *What tools do you have for MIMIC-IV data?*

## Troubleshooting
## 🎩 Pro Tips

- Do you want to pre-approve the usage of all tools in Claude Desktop? Use the prompt below and then select **Always Allow**
- `Prompt:` *Can you please call all your tools in a logical sequence?*

## 🔍 Troubleshooting

### Common Issues

**SQLite "Database not found" errors:**
```bash
# Re-download demo database
m3 init mimic-iv-demo
```
**Local "Parquet not found" or view errors:**
Rerun the `m3 init` command for your chosen dataset.

**MCP client server not starting:**
1. Check your MCP client logs (for Claude Desktop: Help → View Logs)
Expand Down Expand Up @@ -409,11 +542,11 @@ m3-mcp-server

## Roadmap

- **Local Full Dataset**: Complete MIMIC-IV locally (no cloud costs)
- **Advanced Tools**: More specialized medical data functions
- **Visualization**: Built-in plotting and charting tools
- **Enhanced Security**: Role-based access control, audit logging
- **Multi-tenant Support**: Organization-level data isolation
- 🏠 **Complete Local Full Dataset**: Complete the support for `mimic-iv-full` (Download CLI)
- 🔧 **Advanced Tools**: More specialized medical data functions
- 📊 **Visualization**: Built-in plotting and charting tools
- 🔐 **Enhanced Security**: Role-based access control, audit logging
- 🌐 **Multi-tenant Support**: Organization-level data isolation

## Contributing

Expand Down
9 changes: 5 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ maintainers = [
]
readme = "README.md"
license = "MIT"
keywords = ["mimic-iv", "clinical-data", "mcp", "llm", "medical", "healthcare", "sqlite", "bigquery"]
keywords = ["mimic-iv", "clinical-data", "mcp", "llm", "medical", "healthcare", "duckdb", "bigquery"]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Science/Research",
Expand Down Expand Up @@ -50,14 +50,15 @@ dependencies = [
"cryptography>=41.0.0", # Cryptographic operations for JWT
"python-jose[cryptography]>=3.3.0", # Additional JWT support with crypto
"httpx>=0.24.0", # Modern HTTP client for OAuth2 token validation
"duckdb>=1.4.1",
]

[project.dependency-groups]
[dependency-groups]
dev = [
"ruff>=0.4.0",
"pre-commit>=3.0.0",
"pytest>=7.4.0",
"pytest-asyncio>=0.23.0",
"pytest>=8.0.0",
"pytest-asyncio>=0.24.0",
"pytest-mock>=3.10.0",
"aiohttp>=3.8.0", # For MCP client testing
]
Expand Down
Loading
Loading