2025.05.14
- We uploaded our work on arXiv.
TiMo is a novel hierarchical vision transformer foundation model tailored for SITS analysis. At its core, we introduce a spatiotemporal gyroscope attention mechanism that dynamically captures evolving multiscale patterns across both time and space. For pre-training, we curate MillionST, a large-scale dataset of one million images from 100,000 geographic locations, each captured across 10 temporal phases over five years, encompassing diverse geospatial changes and seasonal variations. Leveraging this dataset, we adapt masked image modeling to pre-train TiMo, enabling it to effectively learn and encode generalizable spatiotemporal representations. Extensive experiments across multiple spatiotemporal tasksโincluding deforestation monitoring, land cover segmentation, crop type classification, and flood detectionโdemonstrate TiMo's superiority over state-of-the-art methods.
The MillionST dataset will be released soon.
The code will be released soon.
If you find TiMo helpful, please consider giving this repo a โญ and citing:
@article{TiMo,
title={TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series},
author={Xiaolei Qin and Di Wang and Jing Zhang and Fengxiang Wang and Xin Su and Bo Du and Liangpei Zhang},
journal={arXiv preprint arXiv:2505.08723}
year={2025}
}