Skip to content

Data handling standards

Lars Vilhuber edited this page Oct 21, 2018 · 1 revision

Expand me

Please add to this

Data directory structure

  • Keep "external" data separate from generated data
  • Keep "temporary" data separate from "permanent" data
  • Value-added data should be added to a data repository as soon as possible (openICPSR, Dataverse, Zenodo)

(Almost) never do interactive data processing

  • You should never do interactive or manual data processing
  • If you did it by hand, it should be possible to code it
  • If possible, use data-driven programs.
  • If nothing else solves the problem, use code that identifies the particular cell or observation, but condition on the "need-to-edit" value (so future changes don't get randomly overwritten).

Downloaded or otherwise acquired data

  • Keep in separate folder (e.g., data/external)
  • Folder should have a README.md or SRC.md identifying the source, and the procedure to acquire, if necessary.
  • Ideally also add a bibtex file with the data citation (optional here, but not optional in references!)

Citing data

  • If using data, cite the data! (References)