-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi,
I have been on this quest for a while now because our project team is tasked to align OBIS quality checks with the Core Tests and Assertions from TDWG BDQ TG2. I talked to folks from GBIF Norway, GBIF Helpdesk, OBIS Secretariat, WoRMS, TDWG BDQ TG2 and a couple of GBIF/OBIS nodes specifically about this question, but the answers I got are all different. It is very frustrating to me when there is no consensus between the opinions. Hence I am opening this issue, summarizing what I understood and it would be great if we could find a consensus and solution together.
edit: This issue is talking specifically about the usage of WoRMS LSID in Occurrence
As some of the discussions take place through Slack, emails or in-person, I could not link all of the information as GitHub issue here. Please correct me if I am wrong in any sense.
Definitions
dwc:taxonID
An identifier for the set of dwc:Taxon information. May be a global unique identifier or an identifier specific to the data set.
dwc:Taxon
A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit.
dwc:scientificNameID
An identifier for the nomenclatural (not taxonomic) details of a scientific name.
Why OBIS recommends using dwc:scientificNameID field for WoRMS LSID?
From the response I received from WoRMS helpdesk (it was a thread migrated from OBIS Slack), there are a couple of reasons:
- It is the easiest to be understood by data providers and managers who need to populate the field.
- The argument that the information received by WoRMS really is a scientificName. WoRMS LSID is 1-1 linked to the name provided regardless of whether the taxon is accepted or not. When WoRMS LSID is provided as scientificNameID, people can find their way to WoRMS where taxonomic status, current accepted name and other taxon information is documented.
- OBIS hopes to mitigate the burden to keep track of of names, taxonomic status and synonyms (which may change over time) by recommending the use of WoRMS LSID dwc:scientificNameID (having WoRMS track those changes) and leaving out any Taxon related field. -- this is from this comment
Can WoRMS LSID be used for dwc:taxonID?
Opinion from WoRMS
The response I received from WoRMS helpdesk (via email) is that taxonID is an identifier for a taxon concept and not a name. WoRMS does not have such concept. A remark about marine community links observations to names, not to concepts was also made.
This made me wonder if there is a confusion between dwc:taxonID and dwc:taxonConceptID?
Implementation concern from GBIF
GBIF Helpdesk once responded that it may not be a good idea to have WoRMS LSIDs as taxonIDs because they are not stable. TaxonIDs in the GBIF context should be identical between versions of the dataset, and they could potentially change if they come from unstable LSIDs.
The stability concern - I believe - is referring to WoRMS does not have stable identifiers for taxon concepts.
Please see more in the comments:
- DwC field scientificNameID is not used at all gbif/pipelines#217 (comment)
- DwC field scientificNameID is not used at all gbif/pipelines#217 (comment)
Why WoRMS LSID should be used for dwc:taxonID?
WoRMS is not an authoritative source of information on nomenclatural acts
This is perhaps the biggest argument I received when comes to WoRMS LSID should not be used for dwc:scientificNameID field. @chicoreus mentioned in the comment that the definition for dwc:scientificNameID is explicitly pointing at an authoritative source of information on nomenclatural acts, nomenclators. Since WoRMS is not an authoritative source of information on nomenclatural acts, it is not appropriate to use dwc:scientificNameID for WoRMS LSID. @mdoering also mentioned the concern in this comment.
dwc:taxonID is an identifier without a particular meaning to the instance of the Taxon class
Following @chicoreus comment which aligns well with the Darwin Core definition for dwc:taxonID:
It is an identifier for the package of information associated with a Taxon class, without linking a particular meaning (name string, nomenclatural act, taxon concept, taxon concept including classification) to the instance of the Taxon class. The dwc:taxonID serves as the identifier for the set of information in the terms in a dwc:Taxon instance, without applying additional semantics to the dwc:Taxon instance.
My perspectives as a data manager for both GBIF and OBIS node
Difference in interpretation leads to difficulty in collaboration
It is VERY difficult for me as a data manager for both GBIF and OBIS node when there are differences in interpretation in whether WoRMS LSID should be populated under dwc:taxonID or dwc:scientificNameID. One example is I had this conversation when I attended a workshop organized by Nansen Legacy and GBIF Norway. GBIF Norway thinks that WoRMS LSID should be populated under dwc:taxonID, but OBIS and WoRMS insisted that it should be populated under dwc:scientificNameID with the reasons stated above. Furthermore, dwc:scientificNameID is a mandatory field for OBIS. I appreciate that @pieterprovoost was being pragmatic and mentioned that he will look for solution, such as parsing dwc:taxonID in OBIS data processing. The data could be interpreted better if there is a consensus here.
Implications for the future
The new unified data model
I really hope we could find a consensus now than having this carry over to the new data model (see screenshot below) Screenshot taken today 2023-07-20.
I am aware of this is an immature state of the model. Based on my email conversation with WoRMS, the same issue seems to persist - to WoRMS, it makes sense to add observedScientificNameID to the ReportedAbundance table
My questions
Can we reach a consensus on whether WoRMS LSID should be used for dwc:taxonID or dwc:scientificNameID?
Right now the standard seems to suggest that dwc:taxonID should be used for WoRMS LSID, but the implementation side seems to suggest otherwise. So what exactly should a data manager like me do? This is so frustrating!
Is there anything unclear about the usage of dwc:taxonID, dwc:taxonConceptID or dwc:scientificNameID that should be improved in Darwin Core documentation?
If so, what is it? What leads to different interpretations between different people/organizations? If we could identify that, a term change request should perhaps be submitted.
Thank you
Thank you everyone who talked to me and helped me in understanding this in any way! I hope I summarized the issue well. I definitely am not the most tactful person, apology if I stepped on your ego. Please correct me if I said anything wrong!