-
Notifications
You must be signed in to change notification settings - Fork 16
Introducing Dandiset DOIs #2350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Getting closer to completion now, but there are a few things I have not yet been able to verify:
|
self.api_url = settings.DANDI_DOI_API_URL | ||
self.api_user = settings.DANDI_DOI_API_USER | ||
self.api_password = settings.DANDI_DOI_API_PASSWORD | ||
self.api_prefix = settings.DANDI_DOI_API_PREFIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we would use settings.DANDI_DOI_API_PREFIX or '10.80507'
If api prefix is not set, DOI API operations should be prevented by is_configured()
so I think this is appropriate
5316e66
to
73de829
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo should still be a celery task, but we should make sure it passes or roll back transaction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree that it must pass, or we should roll back the transaction. For starters, we would have to move all of this code into the celery task to ensure that. More practically, why should the outcome of the datacite API call influence this? Handling failure conditions for interacting with the datacite API should be done anyway. It wouldn't help anyone to prevent the deletion of a dandiset due to the datacite API being down (for example). We would just need to ensure that it is eventually deleted.
For example, one way to approach this is to, upon dandiset deletion, place the DOI of the draft version into a table called DeletableDOI
(or something). Then, a scheduled celery task could pull from values in this table, call out to datacite to delete the DOI, and then in the same transaction, delete that row (or mark is as "done") so it's not picked up on the next scheduled task.
This would ensure that the DOI eventually gets deleted, even if there is a transient error during interaction with the datacite API. The same procedure could be implemented for DOI creation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern was not to orphan the doi, but this seems like a reasonable approach
138e463
to
c1a0215
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just posting an exception we observerd while trying locally. Ideally should be more graceful I think
+1 we can handle that more gracefully. This exception occurred because the Dandiset was created by the superuser without a name. So FWIW even if we relax how to_datacite sets that value it will still be an invalid DOI payload. |
a5b9d24
to
3fd6081
Compare
@waxlamp @jjnesbitt This item is on our weekly meeting notes "beyond 30 minutes of Roni" so we rarely get to it. Could someone give the initial review? this "feature" was being in the kitchen for way too long with the design doc PR (#2012) and now this grew big and potentially growing conflicts. |
Yes, I am in the process of reviewing it. |
# Retry with PUT if DOI already exists | ||
update_url = f'{self.api_url}/{doi}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the DOI already exists, what does making a PUT
request do? Update the metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree that it must pass, or we should roll back the transaction. For starters, we would have to move all of this code into the celery task to ensure that. More practically, why should the outcome of the datacite API call influence this? Handling failure conditions for interacting with the datacite API should be done anyway. It wouldn't help anyone to prevent the deletion of a dandiset due to the datacite API being down (for example). We would just need to ensure that it is eventually deleted.
For example, one way to approach this is to, upon dandiset deletion, place the DOI of the draft version into a table called DeletableDOI
(or something). Then, a scheduled celery task could pull from values in this table, call out to datacite to delete the DOI, and then in the same transaction, delete that row (or mark is as "done") so it's not picked up on the next scheduled task.
This would ensure that the DOI eventually gets deleted, even if there is a transient error during interaction with the datacite API. The same procedure could be implemented for DOI creation as well.
draft_version = dandiset.versions.filter(version='draft').first() | ||
if draft_version and draft_version.doi is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of suggestions. First, we know a draft version will always exist on a dandiset, as it is created in the same transaction as the dandiset. So we don't need to check that it exists here.
Second, I actually think we should prevent deleting the dandiset until the DOI is generated. Imagine someone creates and then immediately deletes a dandiset, at the same time the celery task to create the DOI is running. This code block would be skipped, and the dandiset would still be deleted, but the DOI could have been created during that time. In that case, we would have "orphaned DOIs" to worry about.
I think it's acceptable to block deletion on draft DOI creation, and raise an appropriate error message from the request.
# For unpublished dandisets, update or create the draft DOI | ||
# to keep it in sync with the latest metadata | ||
if not locked_version.dandiset.embargoed: | ||
transaction.on_commit( | ||
lambda: update_draft_version_doi_task.delay(locked_version.id) | ||
) | ||
else: | ||
logger.debug( | ||
'Skipping DOI update for embargoed Dandiset %s.', | ||
locked_version.dandiset.identifier, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to be keeping doi metadata up to date anytime a draft version is updated, then I'm even more in favor of what I proposed in my other comment (a table for keeping track of this). Doing it in this fashion is just a disaster waiting to happen in regards to synchronization of metadata.
Hi @jjnesbitt , should we wait on you to implement that extra table (then when do you think you would need time) or should we attempt at doing that? |
I can take this on, as I've already begun the implementation. |
03bee6c
to
65e8306
Compare
95ef02b
to
39dd8a8
Compare
- Dandiset DOI will redirect to the DLP - Example: 10.80507/dandi.000004 - Dandiset DOI is stored in the doi field of the draft version - Dandiset DOI metadata (on Datacite) will match the draft version until first publication - Once a Dandiset is published, the Dandiset DOI metadata will match the latest publication See the design document for more details: dandi#2012
The datacite_client_unsafe bypasses the safety check, but can still be used safely when mocking requests.
39dd8a8
to
988bb7b
Compare
@jjnesbitt how do you feel about this PR/work overall -- could/should we get it closer/over the finish line in the near future? |
Apologies for the delay, I've gotten back to working on this and have made significant progress. For ease of development I've made a separate branch based off this one, and will open that PR once it's ready (hopefully soon). |
Example of Dandiset DOI (note -- no version) : 10.80507/dandi.000004
Remaining TODOs:
e2e/tests/dandisetLandingPage.spec.ts
to ensure we have DOIsaudit
related records on DOI operationsMay be later as part of this depending on what comes in first