Skip to content

Ilya-Fradlin/Interactive4D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interactive4D: Interactive 4D LiDAR Segmentation

Ilya Fradlin, Idil Esen Zulfikar, Kadir Yilmaz, Theodora Kontogianni , Bastian Leibe

RWTH Aachen University, ETH AI Center

Overview

Interactive4D (ICRA 2025) is an interactive neural model for 4D LiDAR segmentation that jointly segments multiple objects across consecutive LiDAR scans in a single step. It supports interactive 4D multi-object segmentation, where a user collaborates with a deep learning model by providing clicks to segment multiple 3D objects simultaneously across multiple frames—improving efficiency, ensuring consistency, and simplifying tracking and annotation.

PyTorch Lightning Code style: black ICRA 2025

teaser



[Project Webpage] [arXiv]

Table of Contents
  1. News
  2. Installation
  3. Data Preprocessing
  4. Training and Evaluation
  5. Interactive Tool
  6. BibTeX
  7. Acknowledgment

News 📰

  • [27/01/2025]: Interactive4D was accepted to ICRA 2025.
  • [28/01/2025]: Code release.

Installation 🔨

The main dependencies of the project are the following:

Python: 3.7
CUDA: 11.6

You can set up a conda environment as follows:

Step 1: Create an environment

git clone https://github.com/Ilya-Fradlin/Interactive4D.git
cd Interactive4D
conda create --name interactive4d python=3.7 pip=22.2*
conda activate interactive4d

Step 2: Install PyTorch

# adjust your CUDA version accordingly!
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html

Step 3: Install Minkowski

3.1 Prepare for installation:

conda install openblas-devel -c anaconda
# adjust your CUDA path accordingly!
export CUDA_HOME=/usr/local/cuda
 # adjust to your corresponding C++ compiler
export CXX=g++-10

3.2 Installation:

 # run the installation command
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

If you run into issues, please also refer to Minkowski's official instructions.

Step 4: Install additional required packages

pip install -r requirements.txt --no-deps

Data preprocessing 📐

Dataset Preprocessing Overview

After setting up your environment, you can preprocess datasets for both training and validation. Provide the path to your raw dataset, and the preprocessing script will generate a corresponding JSON file for the corresponding dataset, collecting information for each scan (including scene_id, scan_id, pcd_filepath, label_filepath, pose). This JSON is critical for the subsequent training or evaluation runs.

To preprocess SemanticKITTI simply run:

python -m datasets.preprocessing.semantickitti_preprocessing preprocess \
--data_dir "PATH_TO_RAW_SEMANTICKITTI_DATASET" \
--save_dir "datasets/jsons/"

To preprocess nuScenes adjust:

python -m datasets.preprocessing.nuscenes_preprocessing preprocess \
--data_dir "PATH_TO_RAW_NUSCENES_DATASET" \
--save_dir "datasets/jsons/"

Preprocessing KITTI360 requires a bit more preparation:

  1. Single-scan Annotations
  • Use the recoverKITTI360label script to produce single-scan annotations.
  • In order to obtain instance labels in addition to semantic labels, modify accumulation.py within recoverKITTI360label as follows (in every relevant occurrence):
superpcd_static[:,[0,1,2,6]] -> superpcd_static[:,[0,1,2,7]]
superpcd_dynamic[:,[0,1,2,6]] -> superpcd_dynamic[:,[0,1,2,7]]
  1. Once you have the per-point labels for each single scan, run:
python -m datasets.preprocessing.kitti360_preprocessing preprocess \
--data_dir "PATH_TO_KITTI360_DATASET" \
--label_dir "PATH_TO_KITTI360_SINGLE_SCAN_LABELS" \  # The recovered labels
--save_dir "datasets/jsons/"

Training and Evaluation 📉

Configuration Setup

Before running any training or evaluation jobs, ensure that the config.yaml file is properly adjusted. Below are the key sections that you may need to modify:

General
  • ckpt_path / weights
    • Set the path to the checkpoint weights.
    • During training, this can be used to resume from a previous checkpoint.
    • During validation, this should point to the final or desired checkpoint.
  • max_num_clicks
    • The total click budget allowed per object per scene.
  • max_clicks_per_obj
    • A higher limit preventing excessive clicks on a single object.
    • Helps avoid “click waste” during unproductive refinements.
  • mode
    • Specifies whether to run in train or validate mode.
    • Each mode references a dedicated section in the config (e.g., modes.train or modes.validate).
Data
  • dataset
    • Which dataset to use (requires a preprocessed JSON).
    • Note: Training is currently only supported on SemanticKITTI, but adjustments can be made for other sets (e.g., nuScenes, KITTI360).
  • window_overlap
    • The number of scans overlapping between consecutive temporal windows.
  • sweep
    • The number of LiDAR scans concatenated for each sample.
  • data_dir
    • The directory containing the preprocessed JSON files.
Clicking Strategy
  • rank_error_strategy
    • Dictates which error region to target each time the model needs an additional click (options: SI, BD).
  • initial_clicking_strategy
    • Specifies where to click within the error region upon the first encounter (options: centroid, random, boundary_dependent, dbscan).
  • refinement_clicking_strategy
    • Defines where to click in subsequent encounters for refining the segmentation (same options as initial).
Logging
  • WandB Integration
    • Adjust project_name, workspace, and entity to match your Weights & Biases setup.
    • An API key may be required if running on a remote server or cluster.
  • visualization_frequency
    • Defines how often to log visualizations to WandB (e.g., point clouds, ground truth).
  • save_predictions
    • If set to true, saves predictions locally to the specified save_dir.

Training 🚀

Training Details
  • Dataset Support
    • Training is currently supported on SemanticKITTI. (It can be adapted to nuScenes, etc., but modification in the data loader would be required.)
  • Multi-GPU Setup
    • To train our model, we use 16×NVIDIA-A40 GPUs, each with 40GB memory.
    • If the training is running in a SLURM cluster, ensure that the number of nodes and GPUs match both your SLURM command and the trainer settings in config.yaml, e.g.:
    • sbatch --nodes=4 --ntasks-per-node=4 --gres=gpu:4 \
             --output=outputs/%j_Interactive4d.txt \
             scripts/job_submissions/run_on_node.sh
                
  • Batch Size & Learning Rate
    • Default batch size is 1 to handle memory constraints.
    • When running multi-GPU training, however, the effective batch size is the number of GPUs multiplied by the local batch size. i.e. 16 in the discussed case above
    • The learning rate is automatically scaled according to the number of GPUs.

Run the training script:

./scripts/train.sh

Evaluation 📈

After training the model / using the the provided weights you can download here:

  • Interactive4D - 3D setup (sweep=1) - here
  • Interactive4D - 4D setup (sweep=4) - here
Evaluation Details
  • Checkpoint Weights
    • Update ckpt_path in config.yaml to point to the desired checkpoint if you are evaluating a different set of weights.
  • Single-GPU Setup
    • Typically uses a 3090 GPU with 24GB of memory.
  • Logging & Visualization
    • The visualization_frequency controls how frequently point clouds are uploaded to WandB (e.g., every n steps).
    • Excessive logging may slow down the evaluation.
  • Saving Predictions
    • If save_predictions is set to true, predictions will be saved to the directory specified in prediction_dir.
    • These can be used for further analysis, e.g., calculating panoptic quality.
  • Multi-Sweep Models
    • Multi-sweep setups (e.g., sweep=4 vs. sweep=10) require compatible weights.
    • Ensure your training and evaluation sweeps match, unless you specifically want to test generalization to a different number of sweeps.

Run the evaluation script:

./scripts/evaluate.sh

Interactive Tool 💻

The interactive tool is a user-friendly interface based on the AGILE3D Indoor annotator, enhanced to handle large and sparse outdoor environments effectively. This tool simplifies the process of segmenting LiDAR data by enabling real-time interaction with the model, making it intuitive for both researchers and practitioners.

User Guide: A comprehensive guide to using the tool, including setup and interaction steps, is available in the Interactive Tool Documentation.


BibTeX 📜

@inproceedings{fradlin2024interactive4d,
  title     = {{Interactive4D: Interactive 4D LiDAR Segmentation}},
  author    = {Fradlin, Ilya and Zulfikar, Idil Esen and Yilmaz, Kadir and Kontogianni, Theodora and Leibe, Bastian},
  booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
  year      = {2025},
  note      = {Accepted}
}

Acknowledgment 🙏

We sincerely thank all the volunteers who participated in our user study! The computing resources for most of the experiments were granted by the Gauss Centre for Supercomputing e.V. through the John von Neumann Institute for Computing on the GCS Supercomputer JUWELS at Julich Supercomputing Centre. Theodora Kontogianni is a postdoctoral research fellow at the ETH AI Center and her research is partially funded by the Hasler Stiftung Grant project (23069). Idil Esen Zulfikar’s research is funded by the BMBF project NeuroSys-D (03ZU1106DA). Kadir Yilmaz's research is funded by the Bosch-RWTH LHC project Context Understanding for Autonomous Systems.

Portions of our code are built upon the foundations of Mask4Former and AGILE3D.

About

[ICRA 2025] Interactive4D: Interactive 4D LiDAR Segmentation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published