Skip to content

Should WoRMS LSID be the value of dwc:taxonID or dwc:scientificNameID in Occurrence core/extension? #203

@ymgan

Description

@ymgan

Hi,

I have been on this quest for a while now because our project team is tasked to align OBIS quality checks with the Core Tests and Assertions from TDWG BDQ TG2. I talked to folks from GBIF Norway, GBIF Helpdesk, OBIS Secretariat, WoRMS, TDWG BDQ TG2 and a couple of GBIF/OBIS nodes specifically about this question, but the answers I got are all different. It is very frustrating to me when there is no consensus between the opinions. Hence I am opening this issue, summarizing what I understood and it would be great if we could find a consensus and solution together.

edit: This issue is talking specifically about the usage of WoRMS LSID in Occurrence

As some of the discussions take place through Slack, emails or in-person, I could not link all of the information as GitHub issue here. Please correct me if I am wrong in any sense.

Definitions

dwc:taxonID

An identifier for the set of dwc:Taxon information. May be a global unique identifier or an identifier specific to the data set.

dwc:Taxon

A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit.

dwc:scientificNameID

An identifier for the nomenclatural (not taxonomic) details of a scientific name.

Why OBIS recommends using dwc:scientificNameID field for WoRMS LSID?

From the response I received from WoRMS helpdesk (it was a thread migrated from OBIS Slack), there are a couple of reasons:

  1. It is the easiest to be understood by data providers and managers who need to populate the field.
  2. The argument that the information received by WoRMS really is a scientificName. WoRMS LSID is 1-1 linked to the name provided regardless of whether the taxon is accepted or not. When WoRMS LSID is provided as scientificNameID, people can find their way to WoRMS where taxonomic status, current accepted name and other taxon information is documented.
  3. OBIS hopes to mitigate the burden to keep track of of names, taxonomic status and synonyms (which may change over time) by recommending the use of WoRMS LSID dwc:scientificNameID (having WoRMS track those changes) and leaving out any Taxon related field. -- this is from this comment

Can WoRMS LSID be used for dwc:taxonID?

Opinion from WoRMS

The response I received from WoRMS helpdesk (via email) is that taxonID is an identifier for a taxon concept and not a name. WoRMS does not have such concept. A remark about marine community links observations to names, not to concepts was also made.

This made me wonder if there is a confusion between dwc:taxonID and dwc:taxonConceptID?

Implementation concern from GBIF

GBIF Helpdesk once responded that it may not be a good idea to have WoRMS LSIDs as taxonIDs because they are not stable. TaxonIDs in the GBIF context should be identical between versions of the dataset, and they could potentially change if they come from unstable LSIDs.

The stability concern - I believe - is referring to WoRMS does not have stable identifiers for taxon concepts.

Please see more in the comments:

Why WoRMS LSID should be used for dwc:taxonID?

WoRMS is not an authoritative source of information on nomenclatural acts

This is perhaps the biggest argument I received when comes to WoRMS LSID should not be used for dwc:scientificNameID field. @chicoreus mentioned in the comment that the definition for dwc:scientificNameID is explicitly pointing at an authoritative source of information on nomenclatural acts, nomenclators. Since WoRMS is not an authoritative source of information on nomenclatural acts, it is not appropriate to use dwc:scientificNameID for WoRMS LSID. @mdoering also mentioned the concern in this comment.

dwc:taxonID is an identifier without a particular meaning to the instance of the Taxon class

Following @chicoreus comment which aligns well with the Darwin Core definition for dwc:taxonID:

It is an identifier for the package of information associated with a Taxon class, without linking a particular meaning (name string, nomenclatural act, taxon concept, taxon concept including classification) to the instance of the Taxon class. The dwc:taxonID serves as the identifier for the set of information in the terms in a dwc:Taxon instance, without applying additional semantics to the dwc:Taxon instance.

My perspectives as a data manager for both GBIF and OBIS node

Difference in interpretation leads to difficulty in collaboration

It is VERY difficult for me as a data manager for both GBIF and OBIS node when there are differences in interpretation in whether WoRMS LSID should be populated under dwc:taxonID or dwc:scientificNameID. One example is I had this conversation when I attended a workshop organized by Nansen Legacy and GBIF Norway. GBIF Norway thinks that WoRMS LSID should be populated under dwc:taxonID, but OBIS and WoRMS insisted that it should be populated under dwc:scientificNameID with the reasons stated above. Furthermore, dwc:scientificNameID is a mandatory field for OBIS. I appreciate that @pieterprovoost was being pragmatic and mentioned that he will look for solution, such as parsing dwc:taxonID in OBIS data processing. The data could be interpreted better if there is a consensus here.

Implications for the future

The new unified data model

I really hope we could find a consensus now than having this carry over to the new data model (see screenshot below) Screenshot taken today 2023-07-20.

Screenshot 2023-07-20 at 15 17 53

I am aware of this is an immature state of the model. Based on my email conversation with WoRMS, the same issue seems to persist - to WoRMS, it makes sense to add observedScientificNameID to the ReportedAbundance table

My questions

Can we reach a consensus on whether WoRMS LSID should be used for dwc:taxonID or dwc:scientificNameID?

Right now the standard seems to suggest that dwc:taxonID should be used for WoRMS LSID, but the implementation side seems to suggest otherwise. So what exactly should a data manager like me do? This is so frustrating!

Is there anything unclear about the usage of dwc:taxonID, dwc:taxonConceptID or dwc:scientificNameID that should be improved in Darwin Core documentation?

If so, what is it? What leads to different interpretations between different people/organizations? If we could identify that, a term change request should perhaps be submitted.

Thank you

Thank you everyone who talked to me and helped me in understanding this in any way! I hope I summarized the issue well. I definitely am not the most tactful person, apology if I stepped on your ego. Please correct me if I said anything wrong!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions