Skip to content

microsoft/BiomedParse

BiomedParse

[Notice] This is v2 of the BiomedParse model, with improved code and model architecture using BoltzFormer, supporting end-to-end 3D inference. Check v1 if you are looking for the original version.

[Paper] [Demo] [Model] [Data] [BibTeX]

This repository hosts the code and resources for BiomedParse, aka "A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities" (Nature Methods). BiomedParse is designed for comprehensive biomedical image analysis. It offers a unified approach to perform segmentation, detection, and recognition across diverse biomedical imaging modalities. By consolidating these tasks, BiomedParse provides an efficient and flexible tool tailored for researchers and practitioners, facilitating the interpretation and analysis of complex biomedical data.

Example Predictions

What's New in v2?

Since the publication of BiomedParse, we've been continuously collecting feedbacks from the community and making progressive efforts to improve and expand its capability and usability. The v2 release provides:

  • Larger pretraining data at million scale covering 200+ anatomies across different modalities.
  • Improved segmentation performance for small objects using the BoltzFormer architecture.
  • SOTA 3D segmentation performance supporting end-to-end volumetric inference (CVPR Challenge).
  • Built-in object existence detection for false positives (no seperate mask checking required).

Should I use v1 or v2?

Short answer: v2 for the 3D modalities, and v1 for the rest.

Version Image type Modalities # tasks Existence detection
v1 2D CT, MRI, Ultrasound, X-Ray, Pathology, Endoscopy, Dermoscopy, Fundus, OCT 100+ Post-inference K-S test
v2 3D CT, MRI, Ultrasound, PET, 3D Microscopy (EM, lightsheet) 200+ Built-in ISD module

News

  • Oct. 15, 2025: BiomedParse v2 release is complete with full support for inference and finetuning!
  • Jun. 11, 2025: BiomedParse is #1 in the CVPR 2025: Foundation Models for Text-guided 3D Biomedical Image Segmentation Challenge! We upgraded our model and finetuned on the challenge dataset with a wider and more comprehensive coverage for 3D biomedical imaging data. Checkout our model in containerized [docker image] for direct inference. Please acknowledge the original challenge if you use this version of the model.
  • Jan. 9, 2025: Refined all object recognition script and added notebook with examples.
  • Dec. 12, 2024: Uploaded extra datasets for finetuning on [Data]. Added random rotation feature for training.
  • Dec. 5, 2024: The loading process of target_dist.json is optimized by automatic downloading from HuggingFace.
  • Dec. 3, 2024: We added inference notebook examples in inference_example_RGB.ipynb and inference_example_NIFTI.ipynb
  • Nov. 22, 2024: We added negative prediction p-value example in inference_example_DICOM.ipynb
  • Nov. 18, 2024: BiomedParse is officially online in Nature Methods!

Installation

git clone https://github.com/microsoft/BiomedParse.git

Conda Environment Setup

conda create -n biomedparse_v2 python=3.10.14
conda activate biomedparse_v2

Install dependencies

pip install -r assets/requirements/requirements.txt 

# The above requirements file assumes your environment uses cuda12.4. Adjust accordingly for your system/environment

pip install azureml-automl-core
pip install opencv-python
pip install git+https://github.com/facebookresearch/detectron2.git

Model Weights

We provides model weights trained on the CVPR 2025 Text-guided 3D Segmentation Challenge dataset. Please acknowledge the original challenge if you use this version of the model. We also refer to the original dataset for necessary image preprocessing.

Option 1: HuggingFace Hub

You can download the pretrained model weights directly from the HuggingFace Hub.

First, install the required package:

pip install huggingface_hub

Then, download the checkpoint file using the HuggingFace Hub API:

from huggingface_hub import hf_hub_download

# Download the checkpoint file
file_path = hf_hub_download(
    repo_id="microsoft/BiomedParse",
    filename="biomedparse_v2.ckpt"
)

print("Model weights downloaded to:", file_path)

Option 2: Direct Download via Command Line

You can also download the file directly using wget or curl:

wget https://huggingface.co/microsoft/BiomedParse/resolve/main/biomedparse_v2.ckpt

or

curl -L -o biomedparse_v2.ckpt https://huggingface.co/microsoft/BiomedParse/resolve/main/biomedparse_v2.ckpt

đź’ˇ Note: If the repository is private, log in with your HuggingFace token using:

huggingface-cli login

before attempting to download.

Now you should have the model weights ready for use!

Model Inference

The v2 of BiomedParse supports segmentation of 3D volumes in a slice-by-slice manner, with neighboring 3D context encoded around each slice in RGB format.

Inference 3D Examples

import numpy as np
import torch
import torch.nn.functional as F
import hydra
from hydra import compose
from hydra.core.global_hydra import GlobalHydra
from utils import process_input, process_output, slice_nms
from inference import postprocess, merge_multiclass_masks
from skimage import segmentation
from huggingface_hub import hf_hub_download

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

GlobalHydra.instance().clear()
hydra.initialize(config_path="configs/model", job_name="example_prediction")
cfg = compose(config_name="biomedparse_3D")
model = hydra.utils.instantiate(cfg, _convert_="object")
model.load_pretrained(hf_hub_download(
  repo_id="microsoft/BiomedParse", filename="biomedparse_v2.ckpt"))
model = model.to(device).eval()

# Example image and prompt
file_path = "examples/imgs/CT_AMOS_amos_0018.npz"

npz_data = np.load(file_path, allow_pickle=True)
imgs = npz_data["imgs"]
text_prompts = npz_data["text_prompts"].item()

print("Loaded image shape:", imgs.shape)
print("Text prompts:", text_prompts)

ids = [int(_) for _ in text_prompts.keys() if _ != "instance_label"]
ids.sort()
text = "[SEP]".join([text_prompts[str(i)] for i in ids])

imgs, pad_width, padded_size, valid_axis = process_input(imgs, 512)

imgs = imgs.to(device).int()

input_tensor = {
    "image": imgs.unsqueeze(0),  # Add batch dimension
    "text": [text],
}

with torch.no_grad():
    output = model(input_tensor, mode="eval", slice_batch_size=4)

mask_preds = output["predictions"]["pred_gmasks"]
mask_preds = F.interpolate(mask_preds, size=(512, 512), mode="bicubic", align_corners=False, antialias=True)

mask_preds = postprocess(mask_preds, output["predictions"]["object_existence"])
mask_preds = merge_multiclass_masks(mask_preds, ids)
mask_preds = process_output(mask_preds, pad_width, padded_size, valid_axis)
print("Processed mask shape:", mask_preds.shape)

Please refer to the inference notebook for more examples.

Evaluation

You need to prepare the public model checkpoint and evaluation data under <YOUR MODEL AND DATA DIR> and put it in evaluate_biomedparse.yaml as

mounts:
  external: <YOUR MODEL AND DATA DIR>

Save the model checkpoint under <YOUR MODEL AND DATA DIR>. Download the validation set of the CVPR 2025 Text-guided 3D Segmentation Challenge dataset. Save the validation images to <YOUR MODEL AND DATA DIR>/data/test, and validation masks to <YOUR MODEL AND DATA DIR>/data/test_mask. Run

python -m azureml.acft.image.components.olympus.app.main \
  --config-path <YOUR ABSOLUTE CONFIG DIRECTORY PATH> \
  --config-name evaluate_biomedparse

Fine-tuning

Want to improve performance for your specific tasks? Here is a detailed instruction for end-to-end finetuning on your own data: FINETUNING

Supported Tasks

  • CT: oncology/pathology (adrenocortical carcinoma, kidney lesions/cysts L/R, liver tumors, lung lesions, pancreas tumors, head–neck cancer, colon cancer primaries, COVID-19, whole-body lesion, lymph nodes); thoracic (lungs L/R, lobes LUL/LLL/RUL/RML/RLL, trachea, airway tree); abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon, esophagus); GU/endocrine (kidneys L/R, adrenal glands L/R, bladder, prostate, uterus); vascular (aorta/tree, SVC, IVC, pulmonary vein, brachiocephalic trunk, subclavian/carotid arteries L/R, brachiocephalic veins L/R, left atrial appendage, portal/splenic vein, iliac arteries/veins L/R); cardiac (heart); head/neck (carotids L/R, submandibular/parotid/lacrimal glands L/R, thyroid, larynx glottic/supraglottic, lips, buccal mucosa, oral cavity, cervical esophagus, cricopharyngeal inlet, arytenoids, eyeball segments ant/post L/R, optic chiasm, optic nerves L/R, cochleae L/R, pituitary, brainstem, spinal cord); neuro/cranial (brain, skull, Circle of Willis CTA); spine/MSK (sacrum, vertebrae C1–S1, humeri/scapulae/clavicles/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
  • MRI: abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon whole, esophagus, bladder, prostate, uterus); colon segments (cecum, appendix, ascending, transverse, descending, sigmoid, rectum); GU (prostate transition zone, prostate lesion); cardiac CMR (LV, RV, myocardium, LA, RA); thoracic (lungs L/R); vascular (aorta, pulmonary artery, SVC, IVC, portal/splenic vein, iliac arteries/veins L/R, carotid arteries L/R, jugular veins L/R); neuro tumors/ischemia (brain, brain tumor, stroke lesion, GTVp/GTVn tumor, vestibular schwannoma intra/extra-meatal, cochleae L/R); glioma components (non-enhancing tumor core, non-enhancing FLAIR hyperintensity, enhancing tissue, resection cavity); white matter disease (WM hyperintensities FLAIR/T1); neurovascular (Circle of Willis MRA); spine/MSK (sacrum, vertebrae regional, discs, spinal canal/cord, humeri/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
  • Ultrasound: cardiac (LV, myocardium, LA), neck (thyroid, carotid artery, jugular vein), neuro (brain tumor), calf MSK (soleus, gastrocnemius medialis/lateralis).
  • PET: whole-body lesion.
  • Electron Microscopy: endolysosomes, mitochondria, nuclei, neuronal ultrastructure, synaptic clefts, axon.
  • Lightsheet Microscopy: brain neural activity, Alzheimer’s plaque, nuclei, vessel.

Citation

Please cite our paper if you use the code, model, or data.

@article{zhao2025foundation,
  title={A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities},
  author={Zhao, Theodore and Gu, Yu and Yang, Jianwei and Usuyama, Naoto and Lee, Ho Hin and Kiblawi, Sid and Naumann, Tristan and Gao, Jianfeng and Crabtree, Angela and Abel, Jacob and others},
  journal={Nature methods},
  volume={22},
  number={1},
  pages={166--176},
  year={2025},
  publisher={Nature Publishing Group US New York}
}

If you use the v2 code or model, please also cite the BoltzFormer paper:

@inproceedings{zhao2025boltzmann,
  title={Boltzmann Attention Sampling for Image Analysis with Small Objects},
  author={Zhao, Theodore and Kiblawi, Sid and Usuyama, Naoto and Lee, Ho Hin and Preston, Sam and Poon, Hoifung and Wei, Mu},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={25950--25959},
  year={2025}
}

Usage and License Notices

The model described in this repository is provided for research and development use only. The model is not intended for use in clinical decision-making or for any other clinical use, and the performance of the model for clinical use has not been established. You bear sole responsibility for any use of this model, including incorporation into any product intended for clinical use.

About

BiomedParse: A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks