Welcome to ROMAN(Robust Object Map Alignment Anywhere). ROMAN is a view-invariant global localization method that maps open-set objects and uses the geometry, shape, and semantics of objects to find the transformation between a current pose and previously created object map. This enables loop closure between robots even when a scene is observed from opposite views.
Included in this repository is code for open-set object mapping and object map registration using our robust data association algorithm. For ROS1/2 integration, please see the roman_ros repo. Further information, including demo videos can be found here.
If you find ROMAN useful in your work, please cite our paper:
M.B. Peterson, Y.X. Jia, Y. Tian, A. Thomas, and J.P. How, "ROMAN: Open-Set Object Map Alignment for Robust View-Invariant Global Localization," Robotics: Science and Systems, 2025.
@article{peterson2025roman,
title={ROMAN: Open-Set Object Map Alignment for Robust View-Invariant Global Localization},
author={Peterson, Mason B and Jia, Yi Xuan and Tian, Yulun and Thomas, Annika and How, Jonathan P},
booktitle={Robotics: Science and Systems (RSS)},
pdf={https://www.roboticsproceedings.org/rss21/p029.pdf},
year={2025}
}
ROMAN has three modules: mapping, data association, and pose graph optimization. The front-end mapping pipeline tracks segments across RGB-D images to generate segment maps. The data association module incorporates semantics and shape geometry attributes from submaps along with gravity as a prior into the ROMAN alignment module to align maps and detect loop closures. These loop closures and VIO are then used for pose graph optimization.
The roman
package has a Python submodule corresponding to each pipeline module. Code for creating open-set object maps can be found in roman.map
. Code for finding loop closures via data association of object maps can be found in roman.align
. Finally, code for interfacing ROMAN with Kimera-RPGO for pose graph optimization can be found in roman.offline_rpgo
.
Direct dependencies, CLIPPER and Kimera-RPGO are installed with the install script.
If you would like to use Kimera-RPGO with ROMAN (required for full demo), please also follow the Kimera-RPGO dependency instructions.
To get this working, you may need to edit the LD_LIBRARY_PATH with the following: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
.
First, activate any virtual environment you would like to use with ROMAN.
Then, clone and install with:
git clone [email protected]:mit-acl/roman.git roman
./roman/install.sh
A short demo is available to run ROMAN on small subset of the Kimera Multi Data.
The subset includes two robots (sparkal1
and sparkal2
) traveling along a path in opposite directions.
We demonstrate ROMAN's open-set object mapping and object-based loop closure to estimate the two robots' cumulative 200 m long trajectories with 1.2 m RMSE absolute trajectory error using our vision-only pipeline.
Instructions for running the demo:
-
Download a small portion of the Kimera Multi Data that is used for the ROMAN SLAM demo. The data subset is available for download here.
-
In your
.bashrc
or in the terminal where you will run the ROMAN demo export the following environment variables:
export ROMAN_DEMO_DATA=<path to the demo data>
export ROMAN_WEIGHTS=<path to this repo>/weights
Note that by default, FastSAM and CLIP are run on GPU.
If your computer does not have a GPU, you can use -p params/demo_no_gpu
which specifies that CPU should be used for running FastSAM and additionally does not use CLIP semantic embeddings since these are extrememly slow to compute on CPU.
Even without CLIP semantic embeddings, the demo may run slower than real-time without using GPU.
cd
into this repo and run the following to start the demo
mkdir demo_output
python3 demo/demo.py \
-p params/demo \
-o demo_output
Here, the -p
argument specifies the parameter directory and the -o
argument specifies the output directory.
Optionally, the mapping process can be visualized with the -m
argument to show the map projected on the camera image as it is created or -3
command to show a 3D visualization of the map.
However, these will cause the demo to run slower.
The output includes map visualization, loop closure accuracy results, and pose graph optimization results including root mean squared absolute trajectory error.
After running the demo, you can create a post-processed visualizations of object matches that have been found (see example here).
To create this visualization, run
python3 ./demo/association_vid.py ./demo_output ./demo_output/association_vid.mp4 --runs sparkal1 sparkal2
You will be shown the results of aligning object submaps which includes the ground truth distance between submaps, and the error in the estimated submap alignment poses. You will then be prompted to enter a submap index to visualize (in order robot 1, along y-axis then robot2, along x-axis). For, example, try
4 7
Then, a video will be created showing the associations that were found between the submaps you entered. One thing you may notice is that the cars that are parked along the sidewalk in this example have changed (the two sessions were not taken at the same time), but ROMAN is still able to use the geometry of the newly parked cars to successfully align submaps.
ROMAN requires RGB images, depth data (either as RGB-aligned depth images or point clouds), odometry information, and transforms between data sources. ROMAN should be runnable on any data with this information, using robotdatapy to interface pose data, image data, and possibly point cloud data. Currently supported data types include ROS1/2 bags, zip files of images, and csv files for poses, with additional data sources in development. Click here for more information on running on custom data.
This research is supported by Ford Motor Company, DSTA, ONR, and ARL DCIST under Cooperative Agreement Number W911NF-17-2-0181.
Additional thanks to Yun Chang for his help in interfacing with Kimera-RPGO.