R-ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications

Research Problem

In collaborative perception systems, multiple unmanned ground vehicles (UGVs) equipped with cameras work together to enhance sensing capabilities and perception accuracy. However, dynamic environments pose significant challenges:

Extrinsic Calibration Errors: Sudden movements, accidents, or terrain changes frequently disrupt camera calibration parameters, causing perception inaccuracies.
Timeliness and Data Freshness: Ensuring real-time updates and fresh data streams is essential, especially in scenarios requiring timely reactions (e.g., pedestrian tracking, emergency alerts).
Communication Constraints: Bandwidth limitations and unreliable communication channels further complicate real-time collaborative perception.

Figure 1 illustrates how unpredictable accidents involving UGVs impact extrinsic calibration parameters, significantly increasing perception errors.

Key Features

1. Channel-Aware Self-Calibration

Re-ID Based Feature Matching: Uses pedestrian re-identification for robust key-point matching
Adaptive Quantization: Adjusts feature precision based on channel quality
Real-time Calibration: No need for calibration targets or manual intervention

2. Information Bottleneck Compression

Task-Oriented Encoding: Compresses features while preserving task-relevant information
Temporal Correlation: Leverages frame-to-frame correlations to reduce redundancy
Dynamic Rate Adjustment: Adapts compression rates based on channel conditions

3. Priority-Aware Robust Fusion

Dynamic Packet Loss Handling: Maintains performance under severe packet loss (up to 40%)
Feature Prioritization: Assigns importance scores to different camera features
Adaptive Masking: Selectively drops less important features during transmission

System Model

The system consists of multiple UGVs equipped with edge cameras that collaboratively track pedestrians, transmitting decoded feature streams to an edge server through wireless channels for comprehensive occupancy mapping and real-time calibration.

Performance Metrics

MODA (Multiple Object Detection Accuracy): Primary perception accuracy metric
Communication Cost: Measured in KB, representing transmission overhead
AoPT (Age of Perceived Targets): Novel metric combining data freshness and target relevance
Calibration Error: Rotation and translation errors in camera extrinsic parameters

Code Architecture

The R-ACP framework consists of three main components:

R-ACP/
├── main_coding_and_inference.py          # Main training and inference script
├── Camera_calibration/
│   └── main-multi-gpu.py                 # Re-ID based camera calibration
├── multiview_detector/
│   ├── models/
│   │   ├── persp_trans_detector.py       # Core perception model with calibration
│   │   └── Priority_network.py           # Priority-aware feature selection
│   ├── datasets/                         # Dataset handling
│   ├── loss/                            # Loss functions
│   └── utils/                           # Utility functions
├── models/                              # Pre-trained model storage
├── models_temp/                         # Temporary model storage
├── temp/Calibration/                    # Calibration error simulation data
└── README.md

Core Components

main_coding_and_inference.py: Main training and inference pipeline that coordinates the entire R-ACP framework
Camera_calibration/main-multi-gpu.py: Re-ID based calibration implementation with multi-GPU support
persp_trans_detector.py: Core perception model implementing:
- Multi-view feature extraction and fusion
- Real-time camera calibration with error simulation
- Information Bottleneck-based compression
- Temporal correlation modeling
Priority_network.py: Lightweight priority network for adaptive feature selection under packet loss conditions

Requirements

To replicate the environment and dependencies used in this project, you will need the following packages:

# Core dependencies
kornia==0.6.1
matplotlib==3.5.3
numpy==1.21.5
pillow==9.4.0
python==3.7.12
pytorch==1.10.0
torchaudio==0.10.0
torchvision==0.11.0
tqdm==4.66.4
pandas>=1.3.0
opencv-python>=4.5.0

# Re-ID and calibration dependencies
torchreid>=1.3.0
ultralytics>=8.0.0  # For YOLOv5
scipy>=1.7.0        # For entropy calculations

Environment Setup

# Create conda environment
conda create -n MVDet_NEXT python=3.7.12

# Activate environment
conda activate MVDet_NEXT

# Install PyTorch and dependencies
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 -c pytorch

# Install additional requirements
pip install kornia==0.6.1 matplotlib==3.5.3 numpy==1.21.5 pillow==9.4.0 tqdm==4.66.4 pandas opencv-python

# Install Re-ID and calibration dependencies
pip install torchreid ultralytics scipy

# Download YOLOv5 pre-trained model (will be downloaded automatically on first run)
# Download OSNet model for Re-ID (will be downloaded automatically on first run)

Dataset Preparation

Our experiments employ the Wildtrack dataset from EPFL. This dataset features high-resolution images captured by seven cameras positioned in an urban environment, recording natural pedestrian trajectories [Chavdarova et al., 2018]. Download the Wildtrack dataset and organize it as follows:

/Data/Wildtrack/
├── Image_subsets/
│   ├── C1/
│   ├── C2/
│   ├── C3/
│   ├── C4/
│   ├── C5/
│   ├── C6/
│   └── C7/
├── annotations_positions/
└── calibrations/

Usage Instructions

Phase 1: Camera Calibration (Optional)

Before running the main collaborative perception system, you can perform initial camera calibration using the Re-ID based method:

# Navigate to calibration directory
cd Camera_calibration

# Run Re-ID based calibration with default settings
python main-multi-gpu.py

# Or modify parameters in the script:
# reference_camera_id = 1    # Reference camera ID
# unknown_camera_id = 7      # Camera to calibrate
# N_threshold = 5            # Number of matching pairs

This will generate:

XML log files with foot center coordinates
Communication cost analysis
Feature matching results

Phase 2: Main Training and Inference

# Navigate to project directory
cd /path/to/R-ACP

# Activate conda environment
conda init
source activate
conda activate MVDet_NEXT

# Copy pre-trained model (if available)
cp models/MultiviewDetector.pth models_temp/MultiviewDetector.pth

# Run training and inference
CUDA_VISIBLE_DEVICES=2,3 python main_coding_and_inference.py \
    --dataset_path "/Data/Wildtrack" \
    --model_path "./models_temp/MultiviewDetector.pth" \
    --epochs 30 \
    --tau_1 2 \
    --tau_2 2 \
    --drop_prob 0.7

Parameter Configuration

Core Parameters

--dataset_path: Path to the Wildtrack dataset
--model_path: Path to pre-trained model (leave empty for training from scratch)
--epochs: Number of training epochs (default: 10)
--batch_size: Batch size for training (default: 1)
--lr: Learning rate (default: 0.1)

R-ACP Specific Parameters

--tau_1: Temporal fusion window size for collaborative perception (default: 0)
--tau_2: Temporal entropy modeling window size (default: 1)
--drop_prob: Packet drop probability for robustness testing (default: 0.1)
--cls_thres: Classification threshold (default: 0.4)

Calibration Error Simulation

The R-ACP framework includes two main calibration components:

1. Re-ID Based Calibration (`/Camera_calibration/`)

The main-multi-gpu.py script implements the Re-ID based calibration method for real-time camera extrinsic parameter estimation:

# Navigate to calibration directory
cd Camera_calibration

# Run Re-ID based calibration
python main-multi-gpu.py

Key Features:

Multi-GPU Support: Utilizes multiple GPUs (0-3) for parallel processing
Pedestrian Detection: Uses YOLOv5 for real-time person detection
Feature Extraction: Employs OSNet for robust re-identification features
Communication Cost Analysis: Calculates entropy-based transmission costs

Configuration Parameters:

reference_camera_id = 1    # Reference camera for calibration
unknown_camera_id = 7      # Camera to be calibrated
N_threshold = 5            # Number of matching pairs to consider
frame_start = 0            # Starting frame number
frame_end = 2000          # Ending frame number  
frame_step = 5            # Frame sampling interval

Output:

XML log files with foot center coordinates in ./24-JSAC/Re_ID_Test/match_log/
Matched images in ./24-JSAC/Re_ID_Test/match_image/
Communication cost metrics for each frame

2. Error Injection During Testing

The main framework automatically simulates calibration errors during testing phase (after epoch 10). Error parameters are read from:

/temp/Calibration/calibration_test_rotation_error.csv

This CSV should contain columns:

Epoch: Current epoch number
Translation Error: Translation error magnitude
Rotation Error: Rotation error magnitude
error_camera: Comma-separated list of camera indices to apply errors

Execution Phases

The R-ACP system operates in three phases:

Phase 0 (Idle): No targets detected, minimal communication
Phase 1 (Calibration): Camera recalibration using Re-ID based method
Phase 2 (Streaming): Active target tracking with adaptive compression

Output and Monitoring

During execution, the system will:

Save training logs to logs/ directory with timestamp
Output current epoch to epoch.log
Generate perception maps in /temp/map_res/
Display MODA (Multiple Object Detection Accuracy) and communication cost metrics

Example Output

Training...
Testing...
maximum_MODA is 85.23%, minimum_bits_loss 12.45 KB
Epoch: 15, Reading CSV for test parameters...
Applied translation error 0.05 and rotation error 0.02 to cameras [1, 3]

GPU Memory Optimization

For systems with limited GPU memory:

# Use single GPU
CUDA_VISIBLE_DEVICES=0 python main_coding_and_inference.py [args]

# Reduce batch size
python main_coding_and_inference.py --batch_size 1 [other args]

Hardware Platform

Our R-ACP framework has been validated on a real hardware testbed consisting of multiple UGVs and edge devices:

Hardware Configuration:

Multiple UGV Platforms: Each equipped with Jetson Orin NX 8G computing units and RGB cameras for local feature encoding
Multiple Edge devices: Powered by the latest Jetson Orin NX Super 16G and Raspberry Pi 5 16GB for feature aggregation and collaborative perception
Wi-Fi Communication: Wireless feature streaming between UGV nodes and edge server
Real-time Processing: Supports real-time collaborative perception with hardware-accelerated inference

This physical implementation demonstrates the practical feasibility of R-ACP in real-world scenarios, validating our theoretical framework under actual communication constraints and computational limitations.

Citation

If you use this code in your research, please cite:

@article{Fang2025RACP,
  author    = {Zhengru Fang and Jingjing Wang and Yanan Ma and Yihang Tao and Yiqin Deng and Xianhao Chen and Yuguang Fang},
  title     = {{R-ACP: R}eal-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications},
  journal   = {IEEE Journal on Selected Areas in Communications},
  year      = {2025},
  month     = {June},
  publisher = {IEEE}
}

Related Work

Please find our related work about collaborative perception:

@article{fang2025ton,
  title={Prioritized Information Bottleneck Theoretic Framework with Distributed Online Learning for Edge Video Analytics},
  author={Fang, Z. and Hu, S. and Wang, J. and Deng, Y. and Chen, X. and Fang, Y.},
  journal={IEEE Transactions on Networking},
  year={Jan. 2025},
  note={DOI: 10.1109/TON.2025.3526148},
  publisher={IEEE}
}

@ARTICLE{fang2025tmc-pacp,
  author={Fang, Zhengru and Hu, Senkang and An, Haonan and Zhang, Yuang and Wang, Jingjing and Cao, Hangcheng and Chen, Xianhao and Fang, Yuguang},
  journal={IEEE Transactions on Mobile Computing}, 
  title={{PACP: P}riority-Aware Collaborative Perception for Connected and Autonomous Vehicles}, 
  year={2024},
  volume={23},
  number={12},
  pages={15003-15018},
  publisher={IEEE}
}

Acknowledgement

We gratefully acknowledge the contributions of the following projects:

MVDet for their invaluable tools and insights into multi-view detection.
TOCOM-TEM for providing task-oriented communication framework for edge video analytics.

Reference

Full Paper (PDF)

Contact

For questions and support, please contact:

Zhengru Fang: [email protected]
Project Repository: github.com/fangzr/R-ACP

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Camera_calibration		Camera_calibration
models		models
models_temp		models_temp
multiview_detector		multiview_detector
Challenges.png		Challenges.png
Hardware.png		Hardware.png
LICENSE		LICENSE
README.md		README.md
main_coding_and_inference.py		main_coding_and_inference.py
system_model.png		system_model.png

License

fangzr/R-ACP

Folders and files

Latest commit

History

Repository files navigation