-
Notifications
You must be signed in to change notification settings - Fork 32
Added genetic diversity fields - Fixes #1610 #1611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
…nto ac-genetic-diversity-Issue1610
…nto ac-genetic-diversity-Issue1610
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! It's well organised, I've left some comments so we can discuss a few things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which project/dataset is this for?
Hi @hannes-ucsc this PR is not for a specific project/dataset. These are the recommendation metadata fields from the HCA Genetic Diversity TaskForce to record the genetic, geographic and in generally human diversity in the HCA studies. Do you have any specific concerns? |
I may be out of the loop, but if this isn't going to be used for any actual projects, why is it being added to the schema? I am worried that this isn't sufficiently modular, risking for human-specific and medical history modules to become a kitchen sink of fields, i.e. a flat, unstructured list of fields, some related and some not. This will make comprehending the schema (and the JSON documents compliant with it) increasingly difficult. For example, |
Thank you for your comment Hannes. Bionetwork coordinators have requested around 250 fields of Tier 2 bionetwork-specific metadata that do not exist currently into our schema. Since the Tier 2 collection has not been officially started yet, we cannot be sure of how frequent all of those fields will be filled. We've shared our concerns with bionetwork coordinators on the feasibility of the metadata collection, but we trust their confidence on the collection. Regarding the modularity. There are indeed some fields that could be clustered together but we choose this modelling to avoid extensive "module-in-a-module" structure since we've avoided this until now (with the exception of ontology modules). I am happy to encapsulate similar fields in modules either inside |
If I understand you correctly, this PR is just the first slate of a series of fairly involved changes. Since it is extremely difficult to fix the schema once metadata using it has been released, it is very important that we get this right from the beginning. Is there any substantive documentation about this effort that you could share?
Given the sheer number of new fields you cite, lack of modularity is a serious concern. I don't see why nesting modules is problematic. Hierarchical structures are commonplace in computer science—in biology, too, for that matter—and have proven to be a useful modeling approach. |
Hi @hannes-ucsc I understand your concerns whether this definitions are going to be adopted by contributors in this way or not. Given your feedback on modularity, I refactored some fields in a more modular way. Let me know if this works better for you. I could split into different PRs for each module but please let me know your comments here before we move into new PRs. |
Release notes
#1610
For
human_specific.json
schema:For
residence.json
schema:For
medical_history.json
schema:For
reproduction_history.json
schema:Reviews requested