Utility tasks containers for argo
LINZ uses Argo workflows for running bulk data tasks in AWS, there are some utilities that are often needed for these tasks
Fetch a layer from the LDS and download it as GeoPackage.
Fetch the latest version of layer 50063 - 50063-nz-chatham-island-airport-polygons-topo-150k and save it into ./output:
lds-fetch-layer --target ./output 50063Multiple layers can be fetched at the same time, fetch 51002 and 51000:
lds-fetch-layer --target ./output 51002 51000Generate target path for ODR buckets using collection metadata.
The date can be omitted from the survey name (example: s3://nz-elevation/new-zealand/new-zealand/dem_1m/2193/) by passing the --no-date-in-survey-path flag.
For imagery naming conventions see: https://github.com/linz/imagery/blob/master/docs/naming.md For elevation naming conventions see: https://github.com/linz/elevation/blob/master/docs/naming.md
generate-path --target-bucket-name nz-imagery s3://linz-workflows-scratch/2024-01/04-is-niwe-hawkes-bay-l7tt4/flat/List files from AWS and split them into groups for processing.
- List all tiffs in a folder:
list s3://linz-imagery/sample --include ".*.tiff$" --output /tmp/list.json- List tiffs and split them into groups of 10:
list s3://linz-imagery/sample --include ".*.tiff$" --group 10 --output /tmp/list.json- List tiffs and split them into groups of either 10 files or 100MB which ever comes first:
list s3://linz-imagery/sample --include ".*.tiff$" --group 10 --group-size 100MB --output /tmp/list.json- Exclude a specific tiff:
list s3://linz-imagery/sample --include ".*.tiff$" --exclude "BG33.tiff$" --output /tmp/list.jsonFormat all JSON files within a directory using prettier.
- Format and overwrite files:
pretty-print source/- Create a copy of the formatted file in another flatten directory (testing only - does not handle duplicate filenames):
pretty-print source/ --target output/Generate a manifest of files that need to be copied and their target paths.
create-manifest s3://link-workflow-artifacts/sample/flat --include ".*.tiff$" --exclude "BG33.tiff$" --output /tmp/list.json --target s3://linz-imagery/sampleCopy the files in the manifest between two locations. For manifest creation see create-manifest.
Only copy files which have changed when using the --no-clobber (or --force-no-clobber) option.
Always copy files even if they have changed when using the --force option.
copy ./debug/manifest-eMxkhansySrfQt79rIbAGOGrQ2ne-h4GdLXkbA3O6mo.json --concurrency 10group an input list into an array of arrays.
group --size 2 "a" "b" "c" '["1","2","3"]'
# [["a","b"], ["c","1"], ["2", "3"]]Create STAC catalog JSON file when given links to catalog template JSON file and location to search for collection.json files.
stac catalog --template catalog_template.json --output catalog.json /path/to/stac/Example template file:
{
"stac_version": "1.0.0",
"type": "Catalog",
"id": "linz-imagery",
"description": "Toitū Te Whenua Land Information New Zealand makes New Zealand's publicly owned aerial and satellite imagery archive freely available to use under an open licence. This public S3 bucket has been made available to enable bulk access and cloud-based data processing. You can also access the imagery through the LINZ Data Service or LINZ Basemaps.",
"links": [
{ "rel": "self", "href": "https://linz-imagery.s3.ap-southeast-2.amazonaws.com/catalog.json" },
{ "rel": "root", "href": "./catalog.json" }
]
}Output will look like:
{
"stac_version": "1.0.0",
"type": "Catalog",
"id": "linz-imagery",
"description": "Toitū Te Whenua Land Information New Zealand makes New Zealand's publicly owned aerial and satellite imagery archive freely available to use under an open licence. This public S3 bucket has been made available to enable bulk access and cloud-based data processing. You can also access the imagery through the LINZ Data Service or LINZ Basemaps.",
"links": [
{
"rel": "self",
"href": "https://linz-imagery.s3.ap-southeast-2.amazonaws.com/catalog.json"
},
{
"rel": "root",
"href": "./catalog.json"
},
{
"rel": "child",
"href": "./auckland/auckland_2010-2011_0.125m/rgb/2193/collection.json",
"title": "Auckland 0.125m Urban Aerial Photos (2010-2011)",
"file:checksum": "1220670da4eb9d1e9a8ce209ac2894bc523ffc33d805718058ff268d20092f3596fd",
"file:size": 387938
},
{
"rel": "child",
"href": "./auckland/auckland_2010-2012_0.5m/rgb/2193/collection.json",
"title": "Auckland 0.5m Rural Aerial Photos (2010-2012)",
"file:checksum": "1220fd8793f08d92ca52ebf283db98c847cf2a23730ff10e8da95121bbd753445068",
"file:size": 23987
}
]
}Format and push a STAC collection.json file and Argo Workflows parameters file to a GitHub repository. Used by the publish-copy Argo Workflow.
stac github-import --source=SOURCE_S3_URL --target=TARGET_S3_URL [--repo-name=OWNER/REPO] [--ticket=TICKET_REFERENCE] [--copy-option=COPY_OPTION]OWNER/REPOdefaults to "linz/imagery".TICKET_REFERENCEis a Jira ticket ID.COPY_OPTIONcan contain a flag for the TIFF and STAC items copy job. Defaults to "--no-clobber".
stac github-import --source=s3://linz-workflows-scratch/2024-03/13-is-niwe-hawkes-bay-all-blocks-xfcxl/flat/ --target=s3://nz-imagery/hawkes-bay/hawkes-bay_2023-2024_0.25m/rgb/2193/ --repo-name=linz/imagery-test --ticket=AIP-56 --copy-option=--forceValidate STAC file(s) from an S3 location
- Validate a single item:
stac validate s3://linz-imagery-staging/test/stac-validate/item1.json- Validate multiple items:
stac validate s3://linz-imagery-staging/test/stac-validate/item1.json s3://linz-imagery/test/test/item2.json- Validate a collection and linked items:
stac validate --recursive s3://linz-imagery-staging/test/stac-validate/collection.json- Validate a collection without validating linked items:
stac validate s3://linz-imagery-staging/test/stac-validate/collection.json- Validate a the
file:checksumof all assets inside of a collection:
stac validate --checksum-assets --recursive s3://linz-imagery-staging/test/stac-validate/collection.json- Validate the
file:checksumof all STAC links inside of a collection:
stac validate --checksum-links --recursive s3://linz-imagery-staging/test/stac-validate/collection.json- Validate the
file:checksumof all assets and STAC links inside of a collection:
stac validate --checksum-assets --checksum-links --recursive s3://linz-imagery-staging/test/stac-validate/collection.jsonMap input TIFF files to output tiles based on their location. Validate their alignment to the tile grid and output retiling information.
Outputs files for visualisation of the tiles and a list of output tiles with their input TIFF files for topo-imagery to use for creating the tiles with GDAL.
input.geojsonGeoJSON file containing the bounding boxes of the source files. Example: input.geojsonoutput.geojsonGeoJSON file containing the bounding boxes of the requested target files. Example: output.geojsonfile-list.jsona list of source and target files to be used as an input fortopo-imagery. Example: file-list.json
Output a list of tiles to be automatically tiled to an appropriate scale determined by the system, and which tile name they should receive when merged. Example:
tileindex-validate --scale=auto s3://linz-imagery/auckland/auckland_2010-2012_0.5m/rgb/2193/Fetch a layer from the LDS and download it as GeoPackage.
- Create a pull request in the basemaps-config repo after imagery layer imported:
bm-create-pr --target
["s3://linz-basemaps/3857/gisborne-cyclone-gabrielle_2023_0.2m/01HAAYW5NXJMRMBZBHFPCNY71J/","s3://linz-basemaps/2193/gisborne-cyclone-gabrielle_2023_0.2m/01HAAYW5PMJ90MGRSQCB9YPX0W/"]Add --individual flag to import layer into standalone individual config file, otherwise import into aerial map. Add --vector flag to import new layer into vector map.
Get a list of STAC items from source datasets that have changed or been added to the source compared to the optional target collection, based on existing hashes in linked STAC documents. Note: If a target collection has been provided, its items links must be resolvable. If no target is specified, all items will be considered updated/new.
Outputs a file-list.json file for topo-imagery to use for generating hillshades (or retiling with GDAL).
file-list.jsona list of source and target files to be used as an input fortopo-imagery. Example: file-list.json
--target-collection
Target collection.json file that needs to be updated. If not provided, all items will be considered updated/new.
identify-updated-items --target-collection s3://nz-elevation/new-zealand/new-zealand/dem-hillshade/2193/collection.json s3://nz-elevation/new-zealand/new-zealand-contour/dem-hillshade_8m/2193/collection.json s3://nz-elevation/new-zealand/new-zealand/dem-hillshade_1m/2193/collection.jsonTo publish a release, the Pull Request opened by release-please bot needs to be merged:
- Open the PR and verify that the
CHANGELOGcontains what you expect in the release. If the latest change you expect is not there, double-check that a GitHub Actions is not currently running or failed. - Approve and merge the PR.
- Once the Pull Request is merged to
mastera GitHub Action it creates the release and publish a new container tagged for this release.