-
Notifications
You must be signed in to change notification settings - Fork 36
Batch edit for relationships #6283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: production
Are you sure you want to change the base?
Conversation
Triggered by 350ee9c on branch refs/heads/issue-6126
TODO: Change upload plan construction for remote to ones |
* Add rolledback to SpDataset * Lint code with ESLint and Prettier Triggered by 7d2a86d on branch refs/heads/issue-6390 * Add text to indicate dataset cannot be edited * Make hot columns readonly based on context * Lint code with ESLint and Prettier Triggered by dcbb593 on branch refs/heads/issue-6390 * Reorder migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permissions
- Verify user with batch edit permissions can batch edit
- Verify there is no batch edit button in Query results for that user
Readonly fields
- Verify the fields are readonly
To-one dependent fields
- Verify a new record (New Cell) is created for rows that had a empty record before edit
- Clear all values of the relationship and verify the record was deleted (highlighted as Deleted Cell)
To-many dependent fields
- Verify a new record (New Cell) is created for rows that had a empty record before edit
- Clear all values of the relationship and verify the record was deleted (highlighted as Deleted Cell)
To-one independent fields
- Verify a new record (New Cell) is created if entered data does not match
- If entered data matches an existing record, verify cell is
Matched And Changed
- Change some fields of the relationship in a row that already had a related record (eg: change last name of a cataloger of a CO)
- Verify a new record was created and other fields of the new record are cloned from the original
- Clear all values of the relationship when it already exists and verify No Change is highlighted but the relationship is updated (you may have to change the null check preference)
To-many independent
- Verify a new record is created for all entered data
- Verify clearing values does not delete the record
- Verify matching does not occur
Tree queries
-
Verify columns that used
(any rank)
are readonly -
Test matching an entire Tree path
-
If values are new a new node is created
-
Test missing ranks dialog with one tree relationship in the query
-
Test missing ranks dialog with different trees in the relationships (eg:
CO -> Determination -> Taxon
andCO -> CollectingEvent -> Locality -> Geography
in the same dataset) -
General test Batch Edit with different trees in the dataset
Cell Types
- Verify
Updated Cell
is highlighted only for simple fields of a record - Verify
Matched And Changed
is highlighted when a data for a relationship matches an existing record (will depend on Batch Edit preferences) - Verify
Deleted Cell
is highlighted when data is cleared for a relationship (will depend on Batch Edit preferences) - Verify
New Cell
is highlighted when new data is entered for a relationship (like in the Workbench)
Preferences
Use only visible fields for match:
-
When selected, verify a match happens with visible fields
-
When not selected, verify a match no longer happens with visible fields
-
(optional) When not selected, verify a match happens only if you match data for ALL fields (including hidden fields that contain data) of a relationship
-
Use only visible fields for empty record check
- When selected, verify cleared cells are deleted with visible fields
- When not selected, verify cleared cells are no longer deleted with visible fields
- (optional) When not selected, cleared cells are only deleted if you clear data for ALL fields (including hidden fields that contain data) of a relationship
User Preferences
- Verify the preference works
- Verify rollback is not visible in batch edit
General test
- General test batch edit with a combination of different field types in the dataset together
Rollback
- Test rollback with different base tables and types of relationships in the dataset
BE from a Record Set query
- Test Batch Edit from the query
Disallowed tables
- Verify batch edit button is disabled for disallowed tables
- Verify there is a tooltip when you hover on the button
Test #6248
- Verify catalog numbers that do not follow the COType's formatter are marked as invalid
Rollback doesn't delete new taxon that were created when batch editing. I didn't get a chance to fully test it but it at least happens through CO -> determinations -> taxon, I'm not sure if it happens through other routes though. Other than that it looks good!
04-28_14.24.mp4
@emenslin Thanks for the review! The current implementation of rollback cannot delete a newly created independent relationship (Determination -> Taxon in this case). Rollback is a batch edit upload of the original dataset. Batch edit can only delete dependent relationships, which means rollback can also only delete dependent relationships. I think we need to document this as a limitation rather than a bug. With that being said, there is a bug right now with rolling back a dependent to-many but that issue is already written up here: #6416 UPDATE: Investigated this a bit more and it turns out the root cause here is the same as #6416. I was able to delete new independent records on rollback when it's not a part of a to-many relationship. |
* Add rolledback to SpDataset * Lint code with ESLint and Prettier Triggered by 7d2a86d on branch refs/heads/issue-6390 * Add text to indicate dataset cannot be edited * Make hot columns readonly based on context * Lint code with ESLint and Prettier Triggered by dcbb593 on branch refs/heads/issue-6390 * Reorder migration * Upgrade celery and its dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permissions
- Verify user with batch edit permissions can batch edit
- Verify there is no batch edit button in Query results for that user
Readonly fields
- Verify the fields are readonly
To-one dependent fields
- Verify a new record (New Cell) is created for rows that had a empty record before edit
- Clear all values of the relationship and verify the record was deleted (highlighted as Deleted Cell)
To-many dependent fields
- Verify a new record (New Cell) is created for rows that had a empty record before edit
- Clear all values of the relationship and verify the record was deleted (highlighted as Deleted Cell)
To-one independent fields
- Verify a new record (New Cell) is created if entered data does not match
- If entered data matches an existing record, verify cell is
Matched And Changed
- Verify a new record was created and other fields of the new record are cloned from the original
- Clear all values of the relationship when it already exists and verify No Change is highlighted but the relationship is updated (you may have to change the null check preference)
To-many independent
- Verify a new record is created for all entered data
- Verify clearing values does not delete the record
- Verify matching does not occur
Tree queries
-
Map a tree relationship (eg: CO -> Determination -> Taxon)
-
Verify columns that used
(any rank)
are readonly -
Create a new dataset with columns that rank to specific ranks
-
Test matching an entire Tree path
-
If values are new a new node is created
-
Test missing ranks dialog with one tree relationship in the query
-
Test missing ranks dialog with different trees in the relationships (eg:
CO -> Determination -> Taxon
andCO -> CollectingEvent -> Locality -> Geography
in the same dataset) -
General test Batch Edit with different trees in the dataset
Cell Types
- Verify
Updated Cell
is highlighted only for simple fields of a record - Verify
Matched And Changed
is highlighted when a data for a relationship matches an existing record (will depend on Batch Edit preferences) - Verify
Deleted Cell
is highlighted when data is cleared for a relationship (will depend on Batch Edit preferences) - Verify
New Cell
is highlighted when new data is entered for a relationship (like in the Workbench)
Preferences
-
When selected, verify a match happens with visible fields
-
When not selected, verify a match no longer happens with visible fields
-
(optional) When not selected, verify a match happens only if you match data for ALL fields (including hidden fields that contain data) of a relationship
-
Use only visible fields for empty record check
- When selected, verify cleared cells are deleted with visible fields
- When not selected, verify cleared cells are no longer deleted with visible fields
- (optional) When not selected, cleared cells are only deleted if you clear data for ALL fields (including hidden fields that contain data) of a relationship
User Preferences
- Verify the preference works
- Verify rollback is not visible in batch edit
General test
- General test batch edit with a combination of different field types in the dataset together
Rollback
- Test rollback with different base tables and types of relationships in the dataset
BE from a Record Set query
- Test Batch Edit from the query
Disallowed tables
- Verify batch edit button is disabled for disallowed tables
- Verify there is a tooltip when you hover on the button
Test #6248
- Verify catalog numbers that do not follow the COType's formatter are marked as invalid
I couldn't find an issue for it so I don't think this has been reported but, changing locality name in a CO batch edit prevents roll back.
05-01_10.12.mp4
Error:
{
"uploaderstatus": {
"operation": "unuploading",
"taskid": "df0e1c89-6bca-4e84-9dc1-a08350a5b60b"
},
"taskstatus": "FAILURE",
"taskinfo": "RollbackFailure('Unable to roll back Collectingevent object (36060) because it is now referenced by another record.')"
I found another issue where sometimes you cannot select which taxon trees you want to add but it adds multiple trees anyway. I'm not sure why you can in some cases and why it isn't working here, my thought is maybe it happens when you have another trees in the query (i.e. in this example there's taxon and geography) but I'm not entirely sure. 05-02_11.26.mp4Link to the data set: https://ojsmnh20250404batcheditlatest-issue-6126.test.specifysystems.org/specify/workbench/513. Once again I don't think this issue has been reported before but let me know if it has. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permissions
- Verify user with batch edit permissions can batch edit
- Verify there is no batch edit button in Query results for that user
Readonly fields
- Verify the fields are readonly
To-one dependent fields
- Verify a new record (New Cell) is created for rows that had a empty record before edit
- Clear all values of the relationship and verify the record was deleted (highlighted as Deleted Cell)
To-many dependent fields
- Verify a new record (New Cell) is created for rows that had a empty record before edit
- Clear all values of the relationship and verify the record was deleted (highlighted as Deleted Cell)
To-one independent fields
- Verify a new record (New Cell) is created if entered data does not match
- If entered data matches an existing record, verify cell is
Matched And Changed
- Verify a new record was created and other fields of the new record are cloned from the original
- Clear all values of the relationship when it already exists and verify No Change is highlighted but the relationship is updated (you may have to change the null check preference)
To-many independent
- Verify a new record is created for all entered data
- Verify clearing values does not delete the record
- Verify matching does not occur
Tree queries
-
Map a tree relationship (eg: CO -> Determination -> Taxon)
-
Verify columns that used
(any rank)
are readonly -
Create a new dataset with columns that rank to specific ranks
-
Test matching an entire Tree path
-
If values are new a new node is created
-
Test missing ranks dialog with one tree relationship in the query
-
Test missing ranks dialog with different trees in the relationships (eg:
CO -> Determination -> Taxon
andCO -> CollectingEvent -> Locality -> Geography
in the same dataset) -
General test Batch Edit with different trees in the dataset
Cell Types
- Verify
Updated Cell
is highlighted only for simple fields of a record - Verify
Matched And Changed
is highlighted when a data for a relationship matches an existing record (will depend on Batch Edit preferences) - Verify
Deleted Cell
is highlighted when data is cleared for a relationship (will depend on Batch Edit preferences) - Verify
New Cell
is highlighted when new data is entered for a relationship (like in the Workbench)
Preferences
-
When selected, verify a match happens with visible fields
-
When not selected, verify a match no longer happens with visible fields
-
(optional) When not selected, verify a match happens only if you match data for ALL fields (including hidden fields that contain data) of a relationship
-
Use only visible fields for empty record check
- When selected, verify cleared cells are deleted with visible fields
- When not selected, verify cleared cells are no longer deleted with visible fields
- (optional) When not selected, cleared cells are only deleted if you clear data for ALL fields (including hidden fields that contain data) of a relationship
User Preferences
- Verify the preference works
- Verify rollback is not visible in batch edit
General test
- General test batch edit with a combination of different field types in the dataset together
Rollback
- Test rollback with different base tables and types of relationships in the dataset
BE from a Record Set query
- Test Batch Edit from the query
Disallowed tables
- Verify batch edit button is disabled for disallowed tables
- Verify there is a tooltip when you hover on the button
Test #6248
- Verify catalog numbers that do not follow the COType's formatter are marked as invalid
Looks good, I did run into a bit of weird behavior when testing a large data set but it did end up working so I am not too worried about it. When trying to roll back a large data set it gets stuck for a longer period of time on the first status dialog but it did end up successfully rolling back after awhile.
Fixes #6126, #6248
Warning
Use a db used for testing other batch edit PRs or create a new one
Adding docs from #4929:
Batch-editing
Implementation and design
Batch edit behaviors
Make a query with columns in the base table, and select relationships to edit. There are 4 different types of relationships, in general. Some example relationships for Collection Object as base table:
Fields
The following fields are readonly. All other simple fields when changed will be updated
NOTE: Columns that map to
(formatted)
or(aggregated)
values will also be readonlyTo-one dependent (for ex. collectionobjectattribute)
These relationships get directly updated. If the to-one is not in the db, it'll create one.
This also includes collectingevent when embedded.
Test cases to consider:
To-many dependent (for ex. determinations)
Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created. Does not consider matching records.
Test cases to consider:
To-one independent (for ex. cataloger)
These relationships get matched, and uploaded (if a match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped
agentspecialty
, it'll use the mapped agentspecialty (rather than cloning previous's agentspecialty).Test cases to consider:
To-many independent
Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.
Test cases to consider:
Trees
There are two different routes to perform tree updates.
Workbench method:
If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.
Update method:
If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"
In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.
Additionally, tree relationship columns that use
(any rank)
are disabled in Batch Edit. Since(any rank)
is only meant to be used for editing exist tree nodes, it is only editable when the base table of your query is a Tree and not when it is part of a relationship.Example:
Collection Object -> Determination -> Taxon -> (any rank) -> name
: READONLYTaxon -> (any rank) -> name
: EDITABLEResults
There are 4 new different type of results;
NoChange
Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.
Updated
Reported when the record's simple fields were changed. This does not consider relationships (they are reported with different result).
NOTE: Sometimes an update could just come from an internal change. For example: For Locality records, the Workbench updates
lat1text
andlong1text
internally which Batch edit will record as an update to a simple field.Deleted
Reported when a record is deleted. Happens when a dependent relationship's cells are all empty (depending on the Batch Edit preference chosen for that dataset).
MatchedAndChanged
Reported when a to-one independent was matched to an existing record, different than the current one.
Preferences
There are four different preference options.
Batch Edit Preferences (2)
Defer For Match
This preference controls whether database fields are included for matching or not. Defaults to true.
Defer For Null
This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit follows. Defaults to false.
The preferences can be accessed from going to Data Mapper > Batch Edit Preferences

User Preferences (1)
Number of query rows
Determines how many number of query results are used for batch-edit. Defaults to 5000.
Can enable/disable rollback
Rollbacks
Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, it makes two datasets internally. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.
This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.
Known Limitations for Rollback
Record Set
Disallowed Tables
The batch edit button will be disabled for the following tables
System tables
Hierarchy tables
Misc tables
Misc Behaviors
Scope Change Error
If a change in scoping attribute (collection, division, discipline etc.) is detected when validating/committing, a validation error will be raised for that row. Users must delete that row from the dataset before continuing. This can happen if the query that was used contained records from tables that are scoped at different levels and the workbench internally tries to change scope of a record.
For example:
Exception
Scope change is allowed only for Loan records. This is because Loan records created from Interactions are scoped only at discipline level while Loan records created from the Workbench are scoped at discipline as well as division level. If you happen to batch edit Loan records that were created from Interactions, batch edit will internally add a division level scoping to such Loan records (since BE uses the Workbench internally). This exception allows Loan records to actually be batch edited without running into a scope change error
Behavior with multiple trees
name
field is required when querying on a tree rankany rank
), Batch Edit expects to have all ranks lower than the chosen rank in the queryTree 1
andTree 2
:Tree 1 -> Species
andTree 1 -> Subspecies
Tree 2 -> Species
(lowest rank)Tree 1 -> Species -> name
Tree 1 -> Species -> name
Tree 1 -> Subspecies -> name
Tree 2 -> Species -> name
Batch editing a Tree path
Editing a single node
(any rank)
query fieldChecklist
self-explanatory (or properly documented)
Testing instructions
NOTE: Refer to the docs above for more info on tests
I would suggest testing on very small datasets so that it is easy to track what changes were made. For the general test section in the instructions below, feel free to use a larger dataset
Permissions
Readonly fields
To-one dependent fields
Use only visible fields for empty record check
To-many dependent fields
Use only visible fields for empty record check
. Refer the video below:https://github.com/user-attachments/assets/313c1bc7-9213-479e-978f-820d2b99358e
To-one independent fields
Matched And Changed
CO -> Cataloger -> First Name
andCO -> Cataloger -> Last Name
First Name
andLast Name
cellsTo-many independent
Tree queries
Map a tree relationship (eg: CO -> Determination -> Taxon)
Map a column to
(any rank)
and to a simple field of the treeBatch Edit
Verify columns that used
(any rank)
are readonlyCreate a new dataset with columns that rank to specific ranks
Test matching an entire Tree path
If values are new a new node is created
NOTE: Same behaviors as Batch edit for multiple trees #6196 apply: If some values are new, nodes are cloned and updated instead of being updated directly
Test missing ranks dialog with one tree relationship in the query
Test missing ranks dialog with different trees in the relationships (eg:
CO -> Determination -> Taxon
andCO -> CollectingEvent -> Locality -> Geography
in the same dataset)General test Batch Edit with different trees in the dataset
Cell Types
Updated Cell
is highlighted only for simple fields of a recordMatched And Changed
is highlighted when a data for a relationship matches an existing record (will depend on Batch Edit preferences)Deleted Cell
is highlighted when data is cleared for a relationship (will depend on Batch Edit preferences)New Cell
is highlighted when new data is entered for a relationship (like in the Workbench)Preferences
Go to Data Mapper > Batch Edit Preferences
Use only visible fields for match:
Use only visible fields for empty record check
User Preferences
General test
Rollback
BE from a Record Set query
Disallowed tables
Test #6248