PatchGen is a modular framework for building deep learning–ready datasets from Earth observation imagery. It leverages Google Earth Engine (GEE), Apache Beam, TensorFlow, and open-source geospatial tools to streamline the extraction, sampling, patching, and exporting of training data for geospatial machine learning.
Whether you're performing land cover classification, vegetation monitoring, or urban mapping, PatchGen provides flexible, YAML-configurable tools to scale your workflow.
Directory | Description |
---|---|
cfg/ |
YAML configuration files defining patch generation pipelines |
data/ |
Ancillary data used for stratified sampling and spatial filtering. |
notebooks/ |
Example notebooks demonstrating sampling and feature engineering sampling |
src/ |
Main source code, organized into independent components: |
├── sampler/ |
Stratified point sampling and feature extraction from GEE |
├── generator/ |
Beam-powered patch generator that creates TFRecords from GEE directly |
├── slicer/ |
Slices exported multiband rasters into TensorFlow-compatible patches |
└── exporter/ |
Exports co-registered predictor and target images to GCS from GEE |
- GEE-native sampling + processing: Generate rich predictor/target variables on the fly
- Custom feature sets: Easily configure time-windowed statistics, indices, or radar metrics
- High-throughput patch creation: Apache Beam pipeline generates compressed TFRecords
- Multiple modes: Choose from direct-from-GEE (
generator
) or pre-exported raster slicing (slicer
) - Reproducibility-first: YAML-driven configs make experiments traceable and swappable
This project is licensed under the MIT License. See LICENSE for details.