Bubbleformer

A deep learning library for training foundation models on the BubbleML 2.0 dataset, focusing on boiling phenomena—an inherently chaotic, multiphase process central to energy and thermal systems.

Figure 1: Overview of BubbleML 2.0 dataset and Bubbleformer downstream tasks

Overview

Bubbleformer is a transformer-based spatiotemporal model that forecasts stable and long-range boiling dynamics (including nucleation, interface evolution, and heat transfer) without dependence on simulation data during inference. The project combines:

Bubbleformer: A novel transformer architecture for forecasting multiphase fluid dynamics
BubbleML 2.0: A comprehensive dataset of boiling simulations across diverse fluids and configurations

Together, they enable machine learning models to generalize across different fluids, boiling regimes, and physical configurations, setting new benchmarks for ML-based modeling of complex thermophysical systems.

Bubbleformer Model

Bubbleformer makes three core contributions to the field:

Beyond prediction to forecasting
- Operates directly on full 5D spatiotemporal tensors while preserving temporal dependencies
- Learns nucleation dynamics end-to-end, enabling long-range forecasting
- Requires no compressed time representations or injected future bubble positions
Generalizing across fluids and flow regimes
- Conditions on thermophysical parameters for cross-scenario generalization
- Handles diverse fluids (cryogenics, refrigerants, dielectrics)
- Supports multiple boiling configurations (pool/flow boiling) and geometries (single/double-sided heaters)
- Covers all flow regimes from bubbly to annular until dryout
Physics-based evaluation
- Introduces interpretable metrics beyond pixel-wise error:
  - Heat flux divergence
  - Eikonal PDE for signed distance functions
  - Mass conservation
- Evaluates physical correctness in chaotic systems

Model Architecture

The primary models available in Bubbleformer are:

AViT (Axial Vision Transformer): A transformer-based model with factored spacetime blocks
UNet (Modern UNet): A UNet architecture for spatial-temporal prediction

BubbleML 2.0 Dataset

BubbleML 2.0 is the most comprehensive boiling dataset to date, significantly expanding the original BubbleML with new fluids, boiling configurations, and flow regimes.

Key Features

160+ high-resolution 2D simulations spanning:
- Pool boiling and flow boiling configurations
- Diverse physics (saturated, subcooled, and single-bubble nucleation)
- Three fluid types:
  - FC-72 (dielectric)
  - R-515B (refrigerant)
  - LN$_2$ (cryogen)
Experimental conditions:
- Constant heat flux boundary conditions
- Double-sided heater configurations
- Full range of flow regimes (bubbly, slug, annular until dryout)

Technical Specifications

Simulation framework: All simulations performed using Flash-X
Data format: HDF5 files
Resolution:
- Spatial and temporal resolution varies by fluid based on characteristic scales
- Adaptive Mesh Refinement (AMR) used where needed
- AMR grids interpolated to regular grids with:
  1. Linear interpolation
  2. Nearest-neighbor interpolation for boundary NaN values
Contents: Each simulation includes:
- Temperature fields
- Velocity components (x/y)
- Signed distance functions (bubble interfaces)
- Thermophysical parameters

For additional details on boundary conditions, numerical methods, and experimental validation, please refer to the bubbleformer paper Appendix B.

Installation

conda env create -f env/bubbleformer_gpu.yaml
conda activate bubbleformer
pip install -r env/requirements.txt
pip install -e .

Repository Structure

.
├── bubbleformer/              # Main package directory
│   ├── config/                # Configuration files
│   │   ├── data_cfg/          # Dataset configurations
│   │   ├── model_cfg/         # Model configurations
│   │   ├── optim_cfg/         # Optimizer configurations
│   │   └── scheduler_cfg/     # Learning rate scheduler configurations
│   ├── data/                  # Data loading and processing modules
│   ├── layers/                # Model layer implementations
│   ├── models/                # Model architecture implementations
│   └── utils/                 # Utility functions (losses, plotting, etc.)
├── env/                       # Environment configuration files
├── examples/                  # Example notebooks
├── samples/                   # Sample data files
└── scripts/                   # Training and inference scripts

Usage

Training

To train a model using the default configuration:

python scripts/train.py

To train with a specific configuration:

python scripts/train.py nodes=1 devices=1 max_epochs=400 batch_size=8

Inference

The repository provides two ways to run inference:

Using the Python script:

python scripts/inference.py --model_path /path/to/model --data_path /path/to/data

Using the Jupyter notebook:

scripts/inference_autoregressive.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bubbleformer		bubbleformer
env		env
examples		examples
media		media
model-zoo		model-zoo
samples		samples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bubbleformer

Overview

Bubbleformer Model

Model Architecture

BubbleML 2.0 Dataset

Key Features

Technical Specifications

Installation

Repository Structure

Usage

Training

Inference

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

HPCForge/Bubbleformer

Folders and files

Latest commit

History

Repository files navigation

Bubbleformer

Overview

Bubbleformer Model

Model Architecture

BubbleML 2.0 Dataset

Key Features

Technical Specifications

Installation

Repository Structure

Usage

Training

Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages