Reformat weather datasets into zarr.
See the dataset integration guide to integrate a new dataset to be reformatted.
We use
uvto manage dependencies and python environmentsrufffor linting and formattingmypyfor type checkingpytestfor testingpre-committo automatically lint and format as you git commit
- Install uv
- Run
uv run pre-commit installto setup the git hooks - If you use VSCode, you may want to install the extensions (ruff, mypy) it will recommend when you open this folder
uv run main --helpuv run main <DATASET_ID> update-templateuv run main <DATASET_ID> backfill-local <INIT_TIME_END>
- Add dependency:
uv add <package> [--dev]. Use--devto add a development only dependency. - Lint:
uv run ruff check - Type check:
uv run mypy - Format:
uv run ruff format - Tests:
- Run tests in parallel on all available cores:
uv run pytest - Run tests serially:
uv run pytest -n 0
- Run tests in parallel on all available cores:
To reformat a large archive we parallelize work across multiple cloud servers.
We use
dockerto package the code and dependencieskubernetesindexed jobs to run work in parallel
- Install
dockerandkubectl. Make suredockercan be found at /usr/bin/docker andkubectlat /usr/bin/kubectl. - Setup a docker image repository and export the DOCKER_REPOSITORY environment variable in your local shell. eg.
export DOCKER_REPOSITORY=us-central1-docker.pkg.dev/<project-id>/reformatters/main - Setup a kubernetes cluster and configure kubectl to point to your cluster. eg
gcloud container clusters get-credentials <cluster-name> --region <region> --project <project> - Create a kubectl secret containing your Source Coop S3 credentials
kubectl create secret generic source-coop-storage-options-key --from-literal=contents='{"key": "...", "secret": "..."}'.
- `DYNAMICAL_ENV=prod uv run main <DATASET_ID> backfill-kubernetes <INIT_TIME_END> <JOBS_PER_POD> <MAX_PARALLELISM>