Skip to content

Conversation

asmacdo
Copy link
Member

@asmacdo asmacdo commented Apr 22, 2025

Example of Dandiset DOI (note -- no version) : 10.80507/dandi.000004

Remaining TODOs:

  • Update Design document for the Zenodo like DOI per dandiset #2012, which should be merged before this comes out of "draft PR"
  • Implement support in to_datacite within dandi-schema: enh: allow creation of dandiset dois (contrasted to a version doi) dandi-schema#297
  • Use DJANGO_DANDI_DOI_PUBLISH to determine whether DOIs are Findable
  • Handle Datacite validation failures (if not enough to make findable, revert to draft)
  • On update, if draft doi, "publish" to Findable
  • Reduce repetition in Datacite API usage, probably create a minimal client?
  • add tests
  • Move (and combine?) "mid level" functions from publish/ dandiset/ views/version/ unembargo/ to doi.py
  • Add Dandset DOI to vue for draft version
  • management command ot populate DOIs for existing public dandisets
  • adjust web "e2e" (fronend) tests in e2e/tests/dandisetLandingPage.spec.ts to ensure we have DOIs
  • Add hand-QA checklist to this PR
  • For handle publication DOIs, catch datacite client exceptions since we are dealing with 2 DOIs (1 create, 1 update)
  • Be more consistent about exception handling and logging
  • consider adding audit related records on DOI operations
  • reconsider using schema's Dandiset instead of constructing unvalidated
  • Decide how to handle "delete" there is no usage of the REST endpoint AFAICT! Admin delete uses a different endpoint

May be later as part of this depending on what comes in first

  • Rebase schema PR on Isaac's changes

@asmacdo
Copy link
Member Author

asmacdo commented May 14, 2025

Getting closer to completion now, but there are a few things I have not yet been able to verify:

  • Deletion of dandisets: I should have superuser but I dont see anything on /admin so I cant actually delete the dandisets to verify DOI delete/hide behavior.
  • Findable/Registered DOIs: when DANDI_DOI_PUBLISH is true, we create findable DOIs which might behave differently. I'll need to patch the DOIs manually to prevent collisions in our test datacite api, and modify dandischema to allow that. A PITA, so I'm doing other stuff first.
  • Unembargo workflow: I ran into 3 issues:

self.api_url = settings.DANDI_DOI_API_URL
self.api_user = settings.DANDI_DOI_API_USER
self.api_password = settings.DANDI_DOI_API_PASSWORD
self.api_prefix = settings.DANDI_DOI_API_PREFIX
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we would use settings.DANDI_DOI_API_PREFIX or '10.80507'

If api prefix is not set, DOI API operations should be prevented by is_configured() so I think this is appropriate

@asmacdo asmacdo force-pushed the enh-dandiset-dois branch 5 times, most recently from 5316e66 to 73de829 Compare May 22, 2025 20:29
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo should still be a celery task, but we should make sure it passes or roll back transaction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that it must pass, or we should roll back the transaction. For starters, we would have to move all of this code into the celery task to ensure that. More practically, why should the outcome of the datacite API call influence this? Handling failure conditions for interacting with the datacite API should be done anyway. It wouldn't help anyone to prevent the deletion of a dandiset due to the datacite API being down (for example). We would just need to ensure that it is eventually deleted.

For example, one way to approach this is to, upon dandiset deletion, place the DOI of the draft version into a table called DeletableDOI (or something). Then, a scheduled celery task could pull from values in this table, call out to datacite to delete the DOI, and then in the same transaction, delete that row (or mark is as "done") so it's not picked up on the next scheduled task.

This would ensure that the DOI eventually gets deleted, even if there is a transient error during interaction with the datacite API. The same procedure could be implemented for DOI creation as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern was not to orphan the doi, but this seems like a reasonable approach

@asmacdo asmacdo force-pushed the enh-dandiset-dois branch from 138e463 to c1a0215 Compare May 22, 2025 22:47
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just posting an exception we observerd while trying locally. Ideally should be more graceful I think

@asmacdo
Copy link
Member Author

asmacdo commented Jun 3, 2025

+1 we can handle that more gracefully. This exception occurred because the Dandiset was created by the superuser without a name. So FWIW even if we relax how to_datacite sets that value it will still be an invalid DOI payload.

@asmacdo asmacdo force-pushed the enh-dandiset-dois branch 3 times, most recently from a5b9d24 to 3fd6081 Compare June 16, 2025 20:14
@yarikoptic
Copy link
Member

@waxlamp @jjnesbitt This item is on our weekly meeting notes "beyond 30 minutes of Roni" so we rarely get to it. Could someone give the initial review? this "feature" was being in the kitchen for way too long with the design doc PR (#2012) and now this grew big and potentially growing conflicts.

@jjnesbitt
Copy link
Member

@waxlamp @jjnesbitt This item is on our weekly meeting notes "beyond 30 minutes of Roni" so we rarely get to it. Could someone give the initial review? this "feature" was being in the kitchen for way too long with the design doc PR (#2012) and now this grew big and potentially growing conflicts.

Yes, I am in the process of reviewing it.

Comment on lines +167 to +168
# Retry with PUT if DOI already exists
update_url = f'{self.api_url}/{doi}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DOI already exists, what does making a PUT request do? Update the metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that it must pass, or we should roll back the transaction. For starters, we would have to move all of this code into the celery task to ensure that. More practically, why should the outcome of the datacite API call influence this? Handling failure conditions for interacting with the datacite API should be done anyway. It wouldn't help anyone to prevent the deletion of a dandiset due to the datacite API being down (for example). We would just need to ensure that it is eventually deleted.

For example, one way to approach this is to, upon dandiset deletion, place the DOI of the draft version into a table called DeletableDOI (or something). Then, a scheduled celery task could pull from values in this table, call out to datacite to delete the DOI, and then in the same transaction, delete that row (or mark is as "done") so it's not picked up on the next scheduled task.

This would ensure that the DOI eventually gets deleted, even if there is a transient error during interaction with the datacite API. The same procedure could be implemented for DOI creation as well.

Comment on lines +86 to +87
draft_version = dandiset.versions.filter(version='draft').first()
if draft_version and draft_version.doi is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of suggestions. First, we know a draft version will always exist on a dandiset, as it is created in the same transaction as the dandiset. So we don't need to check that it exists here.

Second, I actually think we should prevent deleting the dandiset until the DOI is generated. Imagine someone creates and then immediately deletes a dandiset, at the same time the celery task to create the DOI is running. This code block would be skipped, and the dandiset would still be deleted, but the DOI could have been created during that time. In that case, we would have "orphaned DOIs" to worry about.

I think it's acceptable to block deletion on draft DOI creation, and raise an appropriate error message from the request.

Comment on lines +138 to +148
# For unpublished dandisets, update or create the draft DOI
# to keep it in sync with the latest metadata
if not locked_version.dandiset.embargoed:
transaction.on_commit(
lambda: update_draft_version_doi_task.delay(locked_version.id)
)
else:
logger.debug(
'Skipping DOI update for embargoed Dandiset %s.',
locked_version.dandiset.identifier,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to be keeping doi metadata up to date anytime a draft version is updated, then I'm even more in favor of what I proposed in my other comment (a table for keeping track of this). Doing it in this fashion is just a disaster waiting to happen in regards to synchronization of metadata.

@yarikoptic
Copy link
Member

Hi @jjnesbitt , should we wait on you to implement that extra table (then when do you think you would need time) or should we attempt at doing that?

@jjnesbitt
Copy link
Member

Hi @jjnesbitt , should we wait on you to implement that extra table (then when do you think you would need time) or should we attempt at doing that?

I can take this on, as I've already begun the implementation.

@jjnesbitt jjnesbitt force-pushed the enh-dandiset-dois branch 2 times, most recently from 03bee6c to 65e8306 Compare September 3, 2025 16:40
@jjnesbitt jjnesbitt force-pushed the enh-dandiset-dois branch 2 times, most recently from 95ef02b to 39dd8a8 Compare September 4, 2025 20:22
@yarikoptic
Copy link
Member

@jjnesbitt how do you feel about this PR/work overall -- could/should we get it closer/over the finish line in the near future?

@jjnesbitt
Copy link
Member

Apologies for the delay, I've gotten back to working on this and have made significant progress. For ease of development I've made a separate branch based off this one, and will open that PR once it's ready (hopefully soon).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants