Welcome to the Machine Learning for Geospatial repository! This is a curated collection of resources, tools, programming libraries, and courses dedicated to applying machine learning techniques to geospatial data. Whether you're a beginner or an experienced practitioner, this repository aims to provide everything you need to explore the fascinating intersection of machine learning and geospatial science.
Geospatial data is becoming increasingly important in solving real-world problems like urban planning, disaster management, environmental monitoring, and logistics optimization. By integrating machine learning, we can:
- Automate land cover classification from satellite imagery ๐ณ.
- Predict climate trends and weather anomalies ๐ฆ๏ธ.
- Optimize routes for logistics and transportation ๐.
- Monitor and analyze urban growth ๐๏ธ.
-
Geospatial Data Basics
-
Intro to Machine Learning
- Bharatmaps: Bharatmaps.gov.in
- ISRO-Bhuvan: Bhuvan-IndiaGeo-Platform of ISRO
- OpenStreetMap: OpenStreetMap.org
- USGS Earth Explorer: earthexplorer.usgs.gov
- Sentinel Hub: Copernicus Sentinel Data
- NASA Earth Science Data: earthdata.nasa.gov
Hugging Face now lists 272+ geospatial-ready datasets. The tables below highlight 15 heavily requested corpora that pair well with the models in this repo--each one publishes clear licenses (Apache-2.0 or MIT) and ready-to-stream splits.
| Dataset | Modalities & Volume | Why it matters | Link |
|---|---|---|---|
| TerraMesh 10m Cubes | Sentinel-2 RGB + Sentinel-1 VH/VV + Copernicus DEM, global 10 m tiles | Harmonized cubes for training multi-sensor foundation models out-of-the-box. | TerraMesh |
| FLAIR-HUB | 63B annotated pixels across 19 land-cover classes | High-resolution labels engineered for segmentation and active-learning workflows. | FLAIR-HUB |
| Sen1Floods11 | SAR + optical time pairs with flood masks | De facto benchmark for rapid flood mapping and transfer to disaster regions. | Sen1Floods11 |
| Dataset | Samples | Classes & Notes | Link |
|---|---|---|---|
| EuroSAT | 27K multispectral chips (Sentinel-2) | 10 land-use classes; useful for lightweight transfer checks. | EuroSAT |
| BigEarthNet | 590K image patches (S1 + S2) | 43 multi-label land-cover tags, curated for multi-spectral classification. | BigEarthNet |
| RESISC45 | 31.5K RGB aerial tiles | 45 scene categories with strong baselines for ViT/CNN adapters. | RESISC45 |
| AID | 10K aerial chips | 30 scene classes, balanced for quick benchmarking. | AID |
| UC Merced | 2.1K high-res RGB tiles | 21 classes; perfect sanity check before scaling up. | UC Merced |
| Dataset | Focus | Highlights | Link |
|---|---|---|---|
| SpaceNet-8 | Building footprints + roads in multi-sensor stacks | Includes SAR+EO pairs, coastline ports, and routing-ready metadata. | SpaceNet-8 |
| xView | 1M labeled objects, 60 categories | One of the largest object-detection datasets for HR imagery. | xView |
| LEVIR-CD | Bitemporal change detection | 637 paired tiles with pixel-level change masks (buildings & infrastructure). | LEVIR-CD |
| Dataset | Era | Highlights | Link |
|---|---|---|---|
| ERA5-Land | 1940โpresent reanalysis | Hourly atmospheric + land diagnostics, ready for climate downscaling. | ERA5 |
| MERRA-2 | 1980โpresent, 160+ variables | NASA reanalysis with aerosol and chemistry products for ESG analytics. | MERRA-2 |
| Dataset | Specialty | Why you'd use it | Link |
|---|---|---|---|
| ForestNet | Tropical deforestation monitoring | Combines Landsat/Sentinel stacks with expert forest-change labels. | ForestNet |
| CropHarvest | Global cropland classification | 90+ crops with extensive metadata for transfer to regional ag programs. | CropHarvest |
| SSL4EO-S12 | Self-supervised Sentinel-1/2 | Pretext dataset for contrastive/MAE training on unlabeled EO swaths. | SSL4EO-S12 |
from datasets import load_dataset
# Load paired SAR + optical tensors plus labels in a single call
ds = load_dataset(
"ibm-nasa-geospatial/sen1floods11",
split="train",
streaming=True, # hug the Hub without needing full downloads
)
sample = next(iter(ds))
sar = sample["image_sar"] # numpy array: [2, H, W]
optical = sample["image_optical"] # numpy array: [12, H, W]
mask = sample["label"] # 0/1 flood raster
print("Example keys:", sample.keys())๐ก Explore even more curated lists via geospatial dataset search and the community geospatial dataset collection.
- GeoPandas Documentation: geopandas.org
- Satellite Image Analysis with Python: Planet Univesity
- Intro to Google Earth Engine Python API: Community Tutorial โ step-by-step guide to authenticating, loading imagery, and running analyses with the Earth Engine Python client.
import ee
def initialize_ee():
"""Authenticate (if needed) and initialize the Earth Engine Python API."""
try:
ee.Initialize()
except Exception:
ee.Authenticate()
ee.Initialize()
initialize_ee()
# Build a cloud-filtered Sentinel-2 collection over San Francisco for June 2023.
collection = (
ee.ImageCollection("COPERNICUS/S2_SR")
.filterDate("2023-06-01", "2023-06-30")
.filterBounds(ee.Geometry.Point(-122.4194, 37.7749))
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", 10))
)
# Reduce to a representative median image and compute NDVI.
image = collection.median()
ndvi = image.normalizedDifference(["B8", "B4"]).rename("NDVI")
# Define an export footprint (a simple rectangle around San Francisco).
region = ee.Geometry.Rectangle([-123, 37, -122, 38])
# Kick off a Drive export; monitor progress in the Earth Engine Code Editor.
export_task = ee.batch.Export.image.toDrive(
image=ndvi,
description="sf_ndvi_june2023",
folder="earth-engine",
fileNamePrefix="sf_ndvi_june2023",
region=region.getInfo()["coordinates"],
scale=10,
maxPixels=1e10,
)
export_task.start()
print("Export started. Check the Earth Engine Tasks tab.")The Notebooks/geopython-tutorials directory now hosts a curated set of hands-on notebooks. Launch any concept directly in Colab using the badge links below.
| Library | Purpose | Link |
|---|---|---|
| GeoPandas | Handle and analyze vector geospatial data. | GitHub |
| Rasterio | Read and write raster datasets. | GitHub |
| Shapely | Perform geometric operations like intersections and unions. | GitHub |
| Fiona | Read and write vector geospatial data. | GitHub |
| xarray | Work with multidimensional geospatial datasets. | GitHub |
| Library | Purpose | Link |
|---|---|---|
| TorchGeo | Deep learning toolkit tailored for geospatial data. | GitHub |
| scikit-learn | Classic ML algorithms for classification, regression, and clustering. | GitHub |
| TensorFlow / PyTorch | Deep learning frameworks for tasks like image segmentation. | TensorFlow ยท PyTorch |
| LightGBM / XGBoost | Gradient boosting libraries for tabular geospatial datasets. | LightGBM ยท XGBoost |
| Tool | Purpose | Link |
|---|---|---|
| Leaflet | JavaScript library for mobile-friendly interactive maps. | GitHub |
| Folium | Build interactive maps from Python. | GitHub |
| Kepler.gl | Web-based tool for large-scale geospatial visualizations. | GitHub |
| Deck.gl | High-performance 3D geospatial visualizations. | GitHub |
| Category | Model | Why it matters | References |
|---|---|---|---|
| Foundation & SSL | Prithvi-EO (IBMโNASA) | HLS-pretrained foundation models for broad EO transfer. | Hugging Face ยท IBM Research ยท NASA Earthdata |
| Foundation & SSL | SatMAE | Masked autoencoder tuned for temporal/multispectral imagery. | arXiv ยท NeurIPS 2022 |
| Foundation & SSL | SeCo (Seasonal Contrast) | Seasonal Sentinel-2 contrastive pretraining; strong downstream lifts. | CVPR 2021 |
| Foundation & SSL | Scale-MAE | Scale-aware MAE that improves SpaceNet-style building segmentation. | arXiv ยท CVPR 2023 |
| Foundation & SSL | SatlasPretrain (AI2) | Ready-to-use encoders for Sentinel/Landsat and aerial imagery. | GitHub |
| Foundation & SSL | TorchGeo pretrained suite | Catalog of EO-specific backbones (EuroSAT, So2Sat, etc.). | Docs |
| Land-use / LULC | EuroSAT pretrained models | CNN/ViT baselines for Sentinel-2 land-use classification. | GitHub |
| Land-use / LULC | BigEarthNet encoders | Multilabel S1/S2 classifiers for regional land-cover tasks. | BigEarthNet |
| Building footprints | TernausNet / TernausNetV2 | U-Net variants widely used on SpaceNet building segmentation. | arXiv |
| Building footprints | Mask R-CNN / U-Net baselines | Strong SpaceNet MVOI baselines for instance & semantic footprints. | Medium |
| Road extraction | CRESI / CRESIv2 | End-to-end road network & speed extraction (SpaceNet-5 baseline). | GitHub ยท CVPRW 2020 |
| Road extraction | SpaceNet-5 baselines | Routing-quality metrics like APLS_time for road graph scoring. | Medium |
| Change detection | ChangeFormer | Transformer Siamese change detection with open weights. | arXiv ยท GitHub |
| Change detection | TUNetCD | Transformer-U-Net hybrid for remote-sensing change maps. | PMC |
| Cloud masking | s2cloudless | Lightweight Sentinel-2 cloud probability & mask generator. | GitHub |
| Cloud masking | CloudSEN12 / CloudSEN12+ | Benchmark dataset & models for clouds/shadows. | Nature Sci. Data ยท Project |
| Cloud masking | CloudS2Mask | DL library for high-accuracy Sentinel-2 cloud/shadow detection. | ScienceDirect |
| Flood mapping | Sen1Floods11 U-Nets | Benchmarks for SAR/optical flood segmentation. | GitHub |
| Object detection | xView baselines & YOLO | Large-scale overhead object detection benchmarks. | xView ยท Ultralytics Docs |
| SAM for EO | SAMRS | NeurIPSโ23 dataset/code for Segment Anything in remote sensing. | GitHub ยท arXiv |
| Disaster response | Turkey Building Damage Assessment | Rapid post-quake damage grading using high-res imagery. | Project ยท Image |
| Disaster response | Building damage assessment (Siamese CNN) | Siamese CNN for global disaster impact estimation. | Publication ยท Code ยท Image |
| Agricultural infrastructure | Poultry barn mapping | Detects industrial poultry barns to monitor environmental impact. | Publication ยท Code ยท Image |
| Environmental monitoring | Glacier mapping & glacial lakes | Tracks glacier change in the Hindu Kush Himalaya region. | Project ยท Code ยท Image |
| Land-use mapping | Land cover mapping (Microsoft Research) | Country-scale LULC maps with label-scarce deep learning. | Project ยท Downloads ยท Image |
| Renewable energy siting | Renewable Energy Mapping | Identifies solar infrastructure footprints across India. | Publication ยท Code ยท Image |
15+ openly licensed (Apache-2.0 or MIT) geospatial models on Hugging Face pair with this repo's workflows. Start with the foundation table, then grab fine-tuned checkpoints or enterprise Granite variants as needed.
| Model | Params | Focus | License | Link |
|---|---|---|---|---|
| Prithvi-EO-1.0-100M | 100M | HLS pretraining across 100+ countries; ideal lightweight encoder for Edge/Colab. | Apache-2.0 | HF |
| Prithvi-EO-2.0-300M | 300M | Flagship EO foundation updated with improved atmospheric normalization. | Apache-2.0 | HF |
| Prithvi-EO-2.0-600M | 600M | Push-button upgrade for higher fidelity segmentation/classification heads. | Apache-2.0 | HF |
| Prithvi-WxC-1.0-2300M | 2.3B | Weather+climate aware backbone using ERA5 + satellite reanalysis streams. | Apache-2.0 | HF |
| Prithvi-EO-2.0-300M-TL | 300M | Task library variant with adapters for detection/segmentation notebooks. | Apache-2.0 | HF |
| SatCLIP ViT16-L40 | 304M | Microsoft global contrastive model for scene-to-text grounding. | MIT | HF |
| SatCLIP ResNet18 | 11M | Lightweight SatCLIP encoder for embedded inference. | MIT | HF |
| Satlas-Pretrain ViT-B | 86M | AllenAI multi-resolution pretraining on Sentinel/Landsat. | Apache-2.0 | HF |
| Model | Task | Notes | Link |
|---|---|---|---|
| Prithvi-EO-2.0-300M-Sen1Floods11 | Flood segmentation | SAR+optical encoder-decoder with pixel flood masks. | HF |
| Prithvi-EO-2.0-300M-Burn-Scar | Wildfire burn scar mapping | Multi-temporal ingestion to capture pre/post-fire signatures. | HF |
| Prithvi-EO-2.0-300M-CropHarvest | Crop classification | Trained on CropHarvest for agri analytics dashboards. | HF |
| Prithvi-EO-2.0-300M-Multitemporal-Crops | Time-series crop typing | Adds temporal attention heads for season aware decisions. | HF |
| Model | Primary KPI | Notes | Link |
|---|---|---|---|
| Granite-Geo-Biomass | Biomass estimation | Optimized for ESG reporting with uncertainty heads. | HF |
| Granite-Geo-Land-Cover | Land cover classification | Enterprise-ready model with support contracts and batch APIs. | HF |
| Model | Params | Why it helps | Link |
|---|---|---|---|
| Google PaliGemma-3B-GEO | 3B | Combine textual prompts with EO imagery for question answering, captioning, or retrieval. | HF |
- IBM-NASA Geospatial org (all Prithvi releases)
- IBM Granite geospatial collection
- Community geospatial model hub
- IBM + NASA curated bundle
| Domain | Snapshot | Source |
|---|---|---|
| Agriculture | ![]() |
Microsoft Research โ Poultry barn mapping |
| Weather Forecasting | ![]() |
Generated with Matplotlib (pressure anomaly contours) |
| Rainfall | ![]() |
Generated with Matplotlib (monthly rainfall estimate) |
| Land Inspection | ![]() |
Generated with Matplotlib (land inspection grid) |
|
| Concept | Why it matters | Learn more |
|---|---|---|
| Raster data | Pixel/grid representation of continuous surfaces (imagery, elevation). | ArcGIS Pro โ Raster data |
| Vector data | Points/lines/polygons for discrete features (roads, parcels). | Esri GIS Dictionary โ Vector |
| Raster vs Vector | Know when to use each model for analysis and storage. | GISGeography โ Data types |
| Coordinate Reference System (CRS) | Defines how coordinates map to Earth; essential for accuracy. | PROJ โ About CRS |
| EPSG codes | Standard identifiers for CRSs (e.g., 4326, 3857). | pyproj โ CRS reference |
| WGS84 / Lat-Lon (EPSG:4326) | Global geographic CRS used widely in data exchange. | Wikipedia โ EPSG:4326 |
| Web Mercator (EPSG:3857) | Web mapping projection (Google/OSM tiles). | Wikipedia โ Web Mercator |
| UTM | Projected CRS in 6ยฐ zones for low-distortion mapping. | Wikipedia โ UTM |
| Map projections (Mercator) | Transform globe to flat map; understand distortion. | Britannica โ Mercator projection |
| GeoTIFF | Georeferenced raster format standard. | OGC โ GeoTIFF |
| Cloud-Optimized GeoTIFF (COG) | GeoTIFF layout optimized for HTTP range reads. | COG โ Specification |
| GeoPackage (.gpkg) | SQLite-based container for vector/rasters/tiles. | OGC โ GeoPackage |
| Shapefile | Legacy vector format (SHP/SHX/DBF triplet). | Esri โ Shapefile whitepaper |
| GeoJSON | JSON-based vector data format/spec. | IETF RFC 7946 |
| MBTiles | SQLite container for tiled maps (raster/vector). | Mapbox โ MBTiles spec |
| GeoParquet | Columnar (Parquet) storage with geospatial metadata. | GeoParquet v1.1.0 |
| NetCDF | Self-describing array format used for climate/EO. | Unidata โ NetCDF |
| HDF5 | High-performance hierarchical scientific data format. | The HDF Group โ HDF5 intro |
| GDAL | Core library/CLI for raster & vector I/O and transforms. | GDAL documentation |
| gdalwarp (reproject/warp) | Reprojection, resampling, mosaicking. | GDAL โ gdalwarp |
| Resampling methods | Nearest, bilinear, cubic, lanczos, average, etc. | GDAL โ raster resize |
| PROJ / pyproj | Geodetic/projection transforms in code. | pyproj โ PROJ API |
| GeoPandas | Pandas + geometry for vector data in Python. | GeoPandas โ Introduction |
| Spatial joins (GeoPandas) | Combine layers by spatial relationships. | GeoPandas โ Spatial joins |
| Shapely | Geometry objects & operations (buffer, dissolveโฆ). | Shapely repository |
| Rasterio | Python raster I/O built on GDAL. | Rasterio โ Docs PDF |
| Xarray | Labeled N-D arrays (great for EO/gridded time series). | xarray documentation |
| rioxarray | Geo-enhancements for Xarray (CRS, transform, IO). | rioxarray documentation |
| PDAL & LAS | Point-cloud processing & LAS point format spec. | PDAL docs ยท ASPRS โ LAS 1.4 |
| WMS (OGC) | Request map images (rendered rasters) via web. | OGC โ WMS |
| WMTS (OGC) | Tiled map service for fast web maps. | OGC โ WMTS |
| WFS (OGC) | Web access to actual vector features. | OGC โ WFS |
| STAC | Common spec to catalog/discover spatiotemporal assets. | STAC specification |
| Sentinel-2 (MSI) | 13-band optical imagery (10/20/60 m). | Copernicus โ Sentinel-2 |
| Sentinel-1 (SAR) | C-band radar; day/night, cloud-penetrating. | ESA โ Sentinel-1 SAR basics |
| Landsat program | Long-running US optical EO archive. | USGS โ Landsat missions |
| SRTM DEM | Global elevation (void-filled) dataset. | USGS โ SRTM |
| Copernicus DEM | Global DSM (GLO-30/90), GeoTIFF/DTED. | Copernicus โ DEM |
| NDVI | Vegetation โgreennessโ index (NIR-Red)/(NIR+Red). | NASA Earthdata โ NDVI |
| NDWI (Gao 1996) | Vegetation water content index (NIR-SWIR)/(NIR+SWIR). | ScienceDirect โ NDWI |
| NBR | Burn severity index (NIR-SWIR)/(NIR+SWIR). | USGS โ NBR |
| Cloud masking (QA60) | Sentinel-2 bitmask for clouds/cirrus (used in GEE). | Google Earth Engine โ Sentinel-2 SR |
| Zonal statistics | Summarize raster values over polygons. | rasterstats โ Zonal stats |
| Spatial autocorrelation (Moranโs I) | Detect clustering/dispersion in spatial data. | GeoDa โ Moranโs I |
| Toblerโs First Law | โNear things are more related than distant things.โ | Wikipedia โ Toblerโs law |
| MAUP | Bias from aggregating data into arbitrary zones. | Wikipedia โ MAUP |
| Reprojection | Change dataset CRS to align analyses/maps. | GDAL โ raster reproject |
| Rescaling & resampling | Change raster resolution methodically. | GDAL โ raster resize |
| PostGIS | Spatial SQL in PostgreSQL (geometry/geography + ops). | PostGIS documentation |
| QGIS | Open-source desktop GIS (editing, analysis, viz). | QGIS user manual |
| STAC in practice | How agencies expose catalogs (example implementation). | USGS โ STAC example |
Hugging Face Spaces make it easy to demo, duplicate, and productionize geospatial AI without standing up servers. Below are 8+ interactive experiences plus deployment tips that map directly to the models/datasets highlighted above.
| Space | What you can test | Link |
|---|---|---|
| Prithvi EO 2.0 Cloud Gap | Fill cloud gaps, compare reconstructions, and download clean tiles. | Space |
| Sen1Floods11 Flood Segmentation | Upload SAR + optical chips and return pixel-accurate flood masks. | Space |
| Burn Scar Monitor | Detect wildfire burn scars using bi-temporal Sentinel stacks. | Space |
| Crop Classification Lab | Multi-temporal crop typing with confidence scores and feature export. | Space |
| Prithvi EO Explorer | Inspect embeddings, run zero-shot queries, and download features. | Space |
| Space | Capability | Link |
|---|---|---|
| Granite Geospatial Explorer | Compare Granite biomass/land-cover predictions with ground truth layers. | Space |
| Global Flood Dashboard | Monitor near real-time flood risk with Prithvi + ERA5 fusion layers. | Space |
| Remote Sensing Playground | Leafmap/MapLibre-based viewer for overlaying datasets, embeddings, or masks. | Space |
๐ฆ Voila collection: ready-to-deploy Jupyter apps for Spaces live in voila-dashboards/voila-gallery. Duplicate a template, swap the notebook, and push directly to any Hugging Face Space.
- Launch & explore - open the space, switch hardware (CPU/GPU/T4) as needed, and watch the console for preprocessing steps.
- Duplicate - click Duplicate Space to clone into your namespace with secrets preserved via the Variables panel.
- Access via API - use the Use via API tab for ready-made
curl, Python, and JS snippets plushf_tokenguidance. - Embed & automate - copy the
iframesnippet for docs/notebooks or call the Space endpoint from workflows viahuggingface_hub.InferenceClient.
# Create and publish your own geospatial Space
pip install -U "huggingface_hub[cli]"
huggingface-cli login
huggingface-cli repo create space your-hf-handle/prithvi-demo --type gradio --sdk gradio
git clone https://huggingface.co/spaces/your-hf-handle/prithvi-demo
cd prithvi-demo
python - <<'EOF'
from pathlib import Path
Path("app.py").write_text(
"import gradio as gr\n"
"def run(image_path):\n"
" return image_path\n"
"iface = gr.Interface(fn=run, inputs=gr.Image(type='filepath'), outputs='image')\n"
"iface.launch()\n"
)
EOF
huggingface-cli upload --repo-type space --path . your-hf-handle/prithvi-demo
huggingface-cli space hardware set your-hf-handle/prithvi-demo cpu-upgrade| Framework | Ideal for | Docs |
|---|---|---|
| Gradio | Fast MVPs and model cards with sliders/maps. | Docs |
| Streamlit | Data storytelling dashboards with charts + maps. | Docs |
| Voila | Turn notebooks into reproducible apps without rewriting code. | Docs |
| Static HTML + MapLibre | Lightweight viewers that only need tiles/geojson. | Docs |
- All geospatial Spaces on Hugging Face
- IBM-NASA geospatial org (8+ Spaces)
- Prithvi demo collection
- Community remote sensing viewers
-
Introduction to GIS
-
Satellite Image Processing
-
Machine Learning Essentials
- FastAIโs Deep Learning for Coders (includes image classification tasks that can be applied to geospatial data).
- Machine Learning with Python by Coursera
-
Advanced GIS and Remote Sensing
-
Deep Learning for Geospatial Applications
Machine Learning for Remote Sensing (ML4RS) Workshop @ ICLR 2024 Microsoft Sponsor
This is a curated list of papers presented at the Machine Learning for Remote Sensing (ML4RS) Workshop at ICLR 2023 & 2024.
- ๐ Datasets: Curated datasets for geospatial ML experiments.
- ๐ Tutorials: Step-by-step guides and example notebooks.
- ๐ Notebooks: Jupyter notebooks demonstrating geospatial machine learning workflows (1 in-repo quickstart).
- ๐ Notebooks/earthengine: Vendored Google Earth Engine workflows (cloud masking, classification, change detection) sourced from the community โ 42 notebooks across:
CloudMasking(4) ยทRasterProcessing(8) ยทArrayAnalytics(4) ยทVectorAndZonal(4) ยทSpatialJoins(1)ImageCollections(3) ยทSegmentation(1) ยทDetection(1) ยทMachineLearning(4)Terrain(3) ยทWaterMonitoring(4) ยทChangeMonitoring(2) ยทVisualization(3)
- ๐ Tools: Scripts and utilities for geospatial data processing.
We welcome contributions! Whether you want to add a new resource, share a dataset, or submit a tutorial, hereโs how you can help:
- Fork the repository.
- Create a new branch for your changes.
- Submit a pull request with a detailed description.
Give a ๐ if this repo helped you!
For questions, suggestions, or collaborations, feel free to open an issue or connect with me at:
- Email: [[email protected]]
- LinkedIn: [www.linkedin.com/in/curious-susant]
@misc{Curious Susant,
author = {S Susant Achary},
title = {Machine-Learning-for-Geospatial},
year = {2025}
}



