Making the dockerfile and code ready for kagent. #63

papagala · 2025-11-09T20:27:55Z

Add Kubernetes Deployment Support and Build Automation

Summary

This PR adds Kubernetes/Docker deployment capabilities to M3, enabling the MCP server to run in containerized environments with HTTP transport. It also includes build automation via Makefile and comprehensive documentation for AI agent integration.

Changes

🐳 Dockerfile Enhancements

Added HTTP transport configuration for lite and bigquery stages:
- MCP_TRANSPORT=http - enables HTTP mode instead of STDIO
- MCP_HOST=0.0.0.0 - binds to all interfaces for container networking
- MCP_PORT=3000 - exposes MCP server on port 3000
- MCP_PATH=/sse - configures server-sent events endpoint
Exposed port 3000 for intra-cluster and external access

🔧 MCP Server Transport Flexibility (src/m3/mcp_server.py)

Added transport mode detection via MCP_TRANSPORT environment variable
HTTP/SSE mode support using FastMCP's streamable-http transport for Kubernetes
Configurable network settings via environment variables (host, port, path)
Backward compatibility - defaults to STDIO mode for desktop clients

📦 Build Automation (New Makefile)

Interactive Docker registry prompt - prompts for registry/username if not set
Flexible container runtime - supports both Docker (default) and Podman via optional DOCKER variable
Complete build workflow:
- download-db - downloads MIMIC-IV demo database using uv
- build / build-bigquery - builds lite and BigQuery Docker images
- push / push-bigquery - pushes images to registry
- test-image - validates built image
- clean - removes downloaded database files
Version management - configurable IMAGE_TAG (default: 0.0.3)
Non-interactive support - allows registry override via CLI or env var

📚 Documentation (README.md)

New Kubernetes Deployment section:
- Build and push instructions with Makefile
- Service endpoint format for MCP clients
- Note about external Helm charts repository
New AI Agent Integration section:
- Core workflow best practices for querying MIMIC-IV
- Key tables reference with column names
- Query patterns and best practices
- Sample questions organized by complexity level
- Emphasis on schema verification to avoid column name errors

Usage Examples

Build and Deploy

# Interactive (will prompt for registry, uses Docker by default)
make all

# With Podman (optional)
make all DOCKER=podman

# Non-interactive with custom registry
make all DOCKER_REGISTRY=myusername

# Combining options (Podman + custom registry)
make all DOCKER_REGISTRY=myusername DOCKER=podman

# Using environment variable
export DOCKER_REGISTRY=myusername
make all

simonprovost

Thanks very much @papagala ! Here are some preliminary comments prior @rafiattrach more important review 🫡

Cheers!

simonprovost · 2025-11-09T22:16:55Z

src/m3/mcp_server.py

    """Main entry point for MCP server."""
-    # Run the FastMCP server
-    mcp.run()
+    # Check if we should run in HTTP mode (for Kubernetes/Docker)


Inline comments are difficult to maintain in OSS, much better to add a Notes section via docstrings of any functions/classes/ and so on I'd say! Let's see if it's fine with @rafiattrach though!

~> Note that this point is only for py-based-files.

PS: I am aware that the codebase has seen such inline comments; I believe we already discussed some tech debts with Rafi and the team, let's try to avoid including more, if I may say?

Cheers

Of course it repeats many times

Fair point about consistency, some functions might be over-commented while others could use more context. I think both docstrings and inline comments serve their purpose though - wouldn't really call it tech debt, more like style standardization. Happy to discuss commenting standards as a team though

simonprovost · 2025-11-09T22:18:45Z

README.md

+## 🤖 AI Agent Integration
+
+### Agent Instructions
+
+Copy these instructions into your AI agent configuration for optimal MIMIC-IV querying:
+
+**Core Workflow:**
+1. Always start with schema discovery using get_database_schema()
+2. Use get_table_info(table_name) to see columns and sample data
+3. Check sample data for actual formats (dates, column names, values)
+4. Write queries with proper JOINs and LIMIT clauses
+5. Provide context and interpretation with results
+
+**Key Tables:**
+- patients: Demographics (subject_id, gender, anchor_age, anchor_year)
+- admissions: Hospital stays (hadm_id, admittime, dischtime, admission_type)
+- icustays: ICU episodes (stay_id, intime, outtime, los)
+- labevents: Lab results (itemid, value, valuenum, valueuom)
+- prescriptions: Medications (drug, dose_val_rx, route)
+
+**Best Practices:**
+- Always use LIMIT to prevent returning too many rows
+- Verify column names from sample data (e.g., 'anchor_age' not 'age')
+- Handle NULLs explicitly in clinical data
+- Use convenience functions for common patterns
+- Explain results, don't just dump data


Is that not redundant with previous sections? I may be mistaken!

if possible @papagala we can remove redundant things especially in README since it quickly gets bloated with repetitive things especially regarding prompting or examples

simonprovost · 2025-11-09T22:21:22Z

README.md

+- 🌐 **Multi-tenant Support**: Organization-level data isolation

-## Contributing
+## 🐳 Kubernetes Deployment


Shouldn’t we start using collapsible, details-based sections in the README—keeping only the ultimate essentials for end users (such as clinicians), and placing variations like Kubernetes deployment inside collapsible sections?

Would you mind trying @papagala ?

It's becoming pretty long!

@rafiattrach Would you agree with this ?

Ref: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-collapsed-sections

Yea good idea but perhaps we can keep that for a followup separate PR since this isn't the only section that needs such a change

simonprovost · 2025-11-09T22:21:35Z

README.md

+### Sample Questions
+
+**Basic Exploration:**
+- What tables are available in the database?
+- Show me the structure of the patients table
+- Give me a sample of 5 rows from the icustays table
+
+**Patient Analysis:**
+- How many patients are in the database?
+- Show me the age and gender distribution
+- What's the average ICU length of stay?
+
+**Clinical Queries:**
+- Show me patients with diabetes diagnoses
+- What are the most common admission types?
+- Find patients with both high glucose and kidney problems
+- Compare ICU length of stay between emergency and elective admissions
+
+**Deep Dives:**
+- Give me the complete medical history for patient 10001
+- Show all ICU stays, diagnoses, and medications for patient 10006
+- What were the lab trends during a patient's last admission?



Yup I think this is redundant

simonprovost · 2025-11-09T22:23:58Z

Makefile

+# Makefile for M3 Docker Image Build and Push
+DOCKER ?= docker
+IMAGE_NAME := m3-mimic-demo
+IMAGE_TAG ?= 0.0.3


Am I wrong in asking why 0.0.3 please @papagala ?

Perhaps 0.3.0 was meant since that's latest version on github/pypi? in any case perhaps we can already set 0.4.0 since we can bump it up after it gets merged

rafiattrach

I think this needs a rebase first, and the other big PR should get merged faster since it's nearly done. I can test and review this after that merges and after the rebase. Thanks for the addition and contribution!

rafiattrach · 2025-11-16T15:56:48Z

Makefile

+# Makefile for M3 Docker Image Build and Push
+DOCKER ?= docker
+IMAGE_NAME := m3-mimic-demo
+IMAGE_TAG ?= 0.0.3


Perhaps 0.3.0 was meant since that's latest version on github/pypi? in any case perhaps we can already set 0.4.0 since we can bump it up after it gets merged

rafiattrach · 2025-11-16T15:57:48Z

README.md

+- 🌐 **Multi-tenant Support**: Organization-level data isolation

-## Contributing
+## 🐳 Kubernetes Deployment


Yea good idea but perhaps we can keep that for a followup separate PR since this isn't the only section that needs such a change

rafiattrach · 2025-11-16T16:10:44Z

src/m3/mcp_server.py

    """Main entry point for MCP server."""
-    # Run the FastMCP server
-    mcp.run()
+    # Check if we should run in HTTP mode (for Kubernetes/Docker)


Fair point about consistency, some functions might be over-commented while others could use more context. I think both docstrings and inline comments serve their purpose though - wouldn't really call it tech debt, more like style standardization. Happy to discuss commenting standards as a team though

rafiattrach · 2025-11-16T16:11:43Z

README.md

+## 🤖 AI Agent Integration
+
+### Agent Instructions
+
+Copy these instructions into your AI agent configuration for optimal MIMIC-IV querying:
+
+**Core Workflow:**
+1. Always start with schema discovery using get_database_schema()
+2. Use get_table_info(table_name) to see columns and sample data
+3. Check sample data for actual formats (dates, column names, values)
+4. Write queries with proper JOINs and LIMIT clauses
+5. Provide context and interpretation with results
+
+**Key Tables:**
+- patients: Demographics (subject_id, gender, anchor_age, anchor_year)
+- admissions: Hospital stays (hadm_id, admittime, dischtime, admission_type)
+- icustays: ICU episodes (stay_id, intime, outtime, los)
+- labevents: Lab results (itemid, value, valuenum, valueuom)
+- prescriptions: Medications (drug, dose_val_rx, route)
+
+**Best Practices:**
+- Always use LIMIT to prevent returning too many rows
+- Verify column names from sample data (e.g., 'anchor_age' not 'age')
+- Handle NULLs explicitly in clinical data
+- Use convenience functions for common patterns
+- Explain results, don't just dump data


if possible @papagala we can remove redundant things especially in README since it quickly gets bloated with repetitive things especially regarding prompting or examples

Making the dockerfile and code ready for kagent.

b79475e

simonprovost reviewed Nov 9, 2025

View reviewed changes

simonprovost requested a review from rafiattrach November 9, 2025 22:27

rafiattrach reviewed Nov 16, 2025

View reviewed changes

Making the dockerfile and code ready for kagent. #63

Are you sure you want to change the base?

Making the dockerfile and code ready for kagent. #63

Uh oh!

Conversation

papagala commented Nov 9, 2025

Add Kubernetes Deployment Support and Build Automation

Summary

Changes

🐳 Dockerfile Enhancements

🔧 MCP Server Transport Flexibility (src/m3/mcp_server.py)

📦 Build Automation (New Makefile)

📚 Documentation (README.md)

Usage Examples

Build and Deploy

Uh oh!

simonprovost left a comment

Choose a reason for hiding this comment

Uh oh!

simonprovost Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rafiattrach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonprovost Nov 9, 2025 •

edited

Loading