-
Notifications
You must be signed in to change notification settings - Fork 1
Data handling standards
Lars Vilhuber edited this page Oct 21, 2018
·
1 revision
Please add to this
- Keep "external" data separate from generated data
- Keep "temporary" data separate from "permanent" data
- Value-added data should be added to a data repository as soon as possible (openICPSR, Dataverse, Zenodo)
- You should never do interactive or manual data processing
- If you did it by hand, it should be possible to code it
- If possible, use data-driven programs.
- If nothing else solves the problem, use code that identifies the particular cell or observation, but condition on the "need-to-edit" value (so future changes don't get randomly overwritten).
- Keep in separate folder (e.g.,
data/external
) - Folder should have a README.md or SRC.md identifying the source, and the procedure to acquire, if necessary.
- Ideally also add a bibtex file with the data citation (optional here, but not optional in references!)
- If using data, cite the data! (References)