-
Notifications
You must be signed in to change notification settings - Fork 25
How to upload large test data to Zenodo
Any large datasets needed for regression or integration testing should be stored in OpenFE's Software Dev Testing Zenodo Community.
Each new test dataset should be its own Zenodo entry. If you are simply updating or extending an existing dataset, please follow instructions for updating an existing dataset.
Link to an example submission: https://zenodo.org/records/15042470
You cannot link to a Zenodo dataset until it has been published, so it is best to first develop your tests with your data stored locally.
When your PR is ready for review:
- Upload your data to Zenodo (see Uploading to Zenodo
- Create (or update) a
pooch
test fixture with the correct DOI and registry hashes (see Accessing datasets with pooch) - Verify that data is gathered correctly and tests pass locally, then request review on your PR.
- If changes to the dataset are requested during PR review, edit the data and follow the updating an existing dataset instructions. Increment the version as a bugfix.
- Compress your data with a descriptive name. You can upload multiple zipped artifacts to a single zenodo upload, but only do this if you want the data to be version-controlled together. For example, the following data is the same test dataset, but represented in two different file structures:
tar -czvf rbfe_results_serial_repeats.tar.gz rbfe_results_serial_repeats/
tar -czvf rbfe_results_parallel_repeats.tar.gz rbfe_results_parallel_repeats/
-
Go to https://zenodo.org/communities/openfedev and click "New upload"
-
Add your data:
-
Fill out all required information, and add a detailed description.
-
We follow semantic versioning for our test data, so new datasets will likely be
v1.0.0
(you may want to usev0.1.0
in some early development situations). -
Click "Publish to Community". Metadata can be corrected after publishing, but if you need to make edits to the actual dataset(s), follow the instructions for updating an existing dataset.
-
Navigate to your existing dataset's Zenodo page, then click the orange "Edit" button.
-
Upload your new dataset(s)
-
Bump the version according to semantic versioning
-
Create or update the CHANGELOG. Treat this like a software version release, in that you want other developers to understand the changes and their context!
To access the test data stored on Zenodo, we use the pooch
library. Follow pooch
's documentation for downloading files using a DOI, as pooch
has Zenodo-specific support.