MCTED is a machine-learning-ready dataset of paired terrain image and corresponding DEM patches, allowing for training and fine tuning of machine learning models for DEM generation from single optical images.
MCTED has been developed using CTX derived orthoimages and DEMs generated by the NASA Ames Stereo Pipeline in Mars Human Exploration Zone DEM Archive 2023 by Day et al.
Access the dataset on HuggingFace!
Check out the paper on arXiv!
The MCTED dataset consists of 80,898 data samples, each data sample consisting of four different files of 518x518 px size:
- Orthoimage patch - a patch from the original CTX derived orthoimage showing a fragment of Martian terrain. Each image is monochromatic, but saved in an RGB format, with all 3 channels indentical,
- DEM patch - corresponding elevation model to the image patch. This is essentially a 2D array of 32-bit floating point values, each representing the elevation in meters Martian datum the for a given point,
- Invalid NaN mask - a binary mask which indicates which pixels in te original sample from Dat et al. contained missing data, which has been filled in by our processing pipeline,
- Deviation mask - a binary mask which indcates values that were considered elevation outliers
during processing in the process of artifact removal.

To ensure reproducibility, use:
- Python 3.10
- See
requirements.txtfor package dependencies
To fully reproduce the results of this work, the CTX portion of the Day et al. repository is necessary. To download the necessary components refer to the instructions found here or download the data directly from here.
This project requirest powerful hardware to be reproduced. For training and evaluation a GPU is strongly recommended. The project was developed using an Nvidia L40S GPU.
git clone --recursive https://github.com/esa-datalabs/mcted
cd mctedImportant
The repository needs to be cloned with the --recursive flag to clone also the Mars_DEMs repository,
containing the original index file, which we convert to a .csv file and add the cluster information to.
python3 -m venv mcted
# Activate the environment
# Linux
source mcted/bin/activate
# Windows
.\mcted\Scripts\activate
pip install -r requirements.txtAll of the scripts default parameters have been set to reproduce the results of the paper. You only need to adjust the data paths to reproduce the results.
To enable safe dataset division into training and validation splits, we group the samples
python cluster_samples.py --helpTip
Refer to the help flag for full CLI options.
Important
This script creates the DEM_index.csv file which in mandatory for running the training and evaluation scripts.
Generates training/evaluation patches from DEMs and corresponding CTX imagery. This script uses MPI for parallel processing.
mpiexec -n 2 python patch_generation/process_dataset.pyWarning
The complete set of parameters for processing can be found in
patch_generation/config/default_config.py file.
Please refer to it to set your data paths correctly. By default the script expects the Day et al.
repository to be under ctx_orthoimages and will try saving the generated dataset in an output directory.
The script will generate various metadata files and two main directories accepted_patches and rejected_patches.
Accepted patches is what compromises the MCTED dataset.
Trains the U-Net-based monocular depth estimation model on the generated dataset.
python train_unet.py --data_path path_to_mcted_accepted_patchesTip
Refer to the help flag for full CLI options by running python train_unet.py --help.
Performs evaluations on the validation split of MCTED for a given trained model and the chosen
version of DepthAnythingV2.
python evaluate.py --helpTip
Refer to the help flag for full CLI options.
Jupyter notebooks for recreating plots/figures from the paper are available in:
paper_plots/
Open and run notebooks interactively for visualizations and analysis.
Important
Please remember to adjust the paths to where the Day et al. repository as well as the MCTED dataset can be found, as they are not contained inside of this repository.
Some of the results contained in the paper have been placed inside the .artifacts directory. These include:
clustering/dataset_samples_split.yaml- this file contains the names of samples used in the training and validation splits respectively. All patches from each sample are being used in the same split to ensure no data leakage between splits.DEM_index.csv- the modified index file fromMars_DEMsthat is generated as a result of the clustering script.
evaluation/- this directory contains.csvfiles with the results of the evaluation.training/- this directory contains some of the artifacts generated during model training. The trained model checkpoints can be found there as well as the parameters used for training, loss curves and values.
This project is licensed under the European Space Agency Public License (ESA-PL). See LICENCE.txt for full details.