This repository contains the source code for this paper: https://arxiv.org/abs/2503.02143
pip install -r requirements.txtThis repository implements our framework for building Physically Interpretable World Models, following the principles described in our paper.
We provide scripts to collect data, train foundational models (VAE and LSTM), and run experiments for each of the three core principles.
To generate datasets for training, validation, and testing:
python dataCollect.pyThis script collects observation-action pairs and organizes them into train, val, and test splits.
Train a Variational Autoencoder (VAE)
python vae.pyThis trains a VAE to compress high-dimensional observations into latent representations.
Train an LSTM for Prediction
python lstm.pyThis LSTM model is used across all principles to perform temporal prediction in latent space.
To encode observations into modular latent components (e.g., physical state, image features):
python seperate_encoding.pyThis script implements separate encoding branches for each latent subspace.
To train the VAE with alignment constraints (e.g., transformations and their expected effects):
python translation_loss.pyThis loss promotes latent invariance/equivariance aligned with physical transformations.
To incorporate mixed supervision signals during training (fully labeled, weakly labeled, and unlabeled):
python partial_supervision.pyThis script uses weak supervision techniques (e.g., temporal smoothness) to improve interpretability.
To compare results with and without access to velocity estimation, run:
python vel_estimation.pyIf you use this repository in your work, please cite:
@article{sutera2025piwm,
title={Four Principles for Physically Interpretable World Models},
author={Sutera, A. and Mao, P. and Geng, M. and Pan, T. and Ruchkin, I.},
journal={arXiv preprint arXiv:2503.02143},
year={2025}
}