-
Notifications
You must be signed in to change notification settings - Fork 36
Batch edit for multiple trees #6196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The query builder currently only shows rank names to filter on even when you have multiple trees. If you have multiple trees with the same rank name and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE #6196 (comment).
Yeah this is the correct idea! But, shouldn't be needing the params at all. I'm not sure why it was included in the initial query builder by Ben.
RE: #6196 (comment). Yeah, that's what I was stuck on before and was a hard problem. Actually, there are atleast two ways of approaching it: First way:Allow query builder to support specific trees (related: #5251). This would involve a bit of frontend change, although, I've no memory / idea of how hard that will be, best person might actually be Jason. On the backend (assuming treedef is available in queryfieldspec) a. We still need to make the worst case assumption that the user selects a rank from any tree (so we still need to take the maximum of treedefitems). This part is already done. See specify7/specifyweb/stored_queries/query_construct.py Lines 48 to 52 in 7e478b9
b. Need to restrict this to just one tree. specify7/specifyweb/stored_queries/query_construct.py Lines 68 to 71 in 7e478b9
Simplest way will be to add a treeid member field in TreeRankQuery when parsing. Somewhere here: specify7/specifyweb/stored_queries/queryfieldspec.py Lines 189 to 194 in 7e478b9
c. Now, when say a new queryfieldspec with a treeid is encountered, the case-when block will select the correct case where the treedefitem_id is encountered. In the current implementation if say ranks from treedefs 5 and 6 are searched, then both are searched. This is because of the following two places: specify7/specifyweb/stored_queries/query_construct.py Lines 92 to 94 in befd6a6
specify7/specifyweb/stored_queries/query_construct.py Lines 139 to 142 in befd6a6
(The or clause makes the magic. Although, the comment in second is outdated --- I removed model param because it was buggy). d. By the time you get here, the tree query will work for a specific tree. Next is to make it work with batch-edit. All we need to do is make a format like this (taken from #5091 (comment)).
Note: I'm pretty pretty sure treeId was not necessary anymore (double check this, it makes things more simpler. We can still make things work though.). e. Add a function like
f. Replace
by node.name.lower() if not isinstance(node, TreeRankQuery) else node.get_wb_name() .
And that should be all. This is because the relationship name (and hence the rank) will contain treedefid. If the "treeId" prop is required (like in here:
g. You'd not need to make adjustments to parsing of batch-edit pack. h. Things should just work out of the box by this time now. Second way:a. Don't make any changes to the frontend. specify7/specifyweb/stored_queries/batch_edit.py Line 1034 in 7e478b9
That is, suppose you have two trees Fossil and Rocks and both contain rank species. And user's query was (Determination -> Taxon -> Species -> Name), then rewrite this to contain two query fields: (Determination -> Taxon -> Species (Fossil) -> Name) So, each such field will get diverged into however many trees contain it. I can look over the changes in TreeRecord.py more deeply later (on Tuesday/Thursday morning/evening). Querying a specific tree support on the backend is needed regardless. Definitely see if you have a better idea. On the frontend btw, it'll be probably be beneficial to add the treedef name in the list in BatchEdit.tsx that asks you to select extra fields. Also, if you feel things can be simplified in batch-edit in general (even for simple fields PR), definitely do that (and lmk! only fair since I called other code bloated). |
specifyweb/frontend/js_src/lib/components/WbPlanView/navigatorSpecs.ts
Outdated
Show resolved
Hide resolved
Vinnie
You are keeping all this stuff in your head? I'm worried about you.
James H. Beach
Specify Collections Consortium
Biodiversity Institute
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
***@***.***
www.specifysoftware.org
Office: +1 785.864.4645
Cell: +1 785.331.8508
…On Sun, Feb 9, 2025 at 12:51 AM Vinayak Jha ***@***.***> wrote:
RE: #6196 (comment)
<#6196 (comment)>.
Yeah, that's what I was stuck on before and was a hard problem. Actually,
there are atleast two ways of approaching it:
First way:
Allow query builder to support specific trees (related: #5251
<#5251>). This would involve a
bit of frontend change, although, I've no memory / idea of how hard that
will be, best person might actually be Jason. On the backend (assuming
treedef is available in queryfieldspec)
a. We still need to make the worst case assumption that the user selects a
rank from any tree (so we still need to take the maximum of treedefitems).
This part is already done. See
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/query_construct.py#L48-L52
b. Need to restrict this to just one tree.
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/query_construct.py#L68-L71
Simplest way will be to add a treeid member field in TreeRankQuery when
parsing. Somewhere here:
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/queryfieldspec.py#L189-L194
c. Now, when say a new queryfieldspec with a treeid is encountered, the
case-when block will select the correct case where the treedefitem_id is
encountered. In the current implementation if say ranks from treedefs 5 and
6 are searched, then *both* are searched. This is because of the
following two places:
https://github.com/specify/specify7/blob/befd6a625df3ef0999d5df8bc14b4aea4302acd8/specifyweb/stored_queries/query_construct.py#L92-L94
https://github.com/specify/specify7/blob/befd6a625df3ef0999d5df8bc14b4aea4302acd8/specifyweb/stored_queries/query_construct.py#L139-L142
(The *or* clause makes the magic. Although, the comment in second is
outdated --- I removed model param because it was buggy).
d. By the time you get here, the tree query will work for a specific tree.
Next is to make it work with batch-edit. *All* we need to do is make a
format like this (taken from #5091 (comment)
<#5091 (comment)>).
"mustMatchTreeRecord": {
"ranks": {
"SomeTreeName~>Class": {
"treeNodeCols": {
"name": {
"matchBehavior": "ignoreAlways",
"nullAllowed": true,
"default": null,
"column": "Class"
}
},
"treeId": 1
},
}
Note: I'm pretty pretty sure treeId was not necessary anymore (double
check this, it makes things more simpler. We can still make things work
though.).
e. Add a function like get_wb_name to TreeRankQuery. Assuming that there
is a treeId member field, it'll be something like (in pseudocode). *Yes,
getting self.treeId (so the ID) instead of the treeName -- not a mistake!
-- it makes things simpler later*
def get_wb_name(self): return self.treeId + "~>" + self.name # Please use the constant for "~>" though.
f. Replace
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L399
by node.name.lower() if not isinstance(node, TreeRankQuery) else
node.get_wb_name().
And that should be all. This is because the relationship name (and hence
the rank) will contain treedefid. If the "treeId" prop is required (like in
here:
https://github.com/specify/specify7/blob/eb3c8200bad92af9f43c063655ba47f8a02badf2/specifyweb/frontend/js_src/lib/tests/fixtures/uploadplan.1.json#L319).
Trace the code out in TreeRecord.py (as introduced in #5091
<#5091>) and see if it actually
is needed. I can spend time for a more in-depth review laer. That file
(after 5091) is waaay too bloated (and I intended on refactoring as part of
the merge of production and 4929). If it looks like it is actually needed,
construct TreeRankRecord itself. To do that, extract the id from the key
here (which was added by get_wb_name, and then later get the name of the
treedef). The key, in question, is here:
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L934.
Yeah, quite hacky, but things are unnecessarily complicated from bloating
in TreeRecord.py.
g. You'd not need to make adjustments to parsing of batch-edit pack.
h. Things should just work out of the box by this time now.
Second way:
a. Don't make any changes to the frontend.
b. Before getting row plan map in batch-edit query below, augment the
query such that if a rank appears, then you add all trees that contains
that rank (in that discipline). So, essentially, you are making the choice
for the user here (that's why no frontend changes).
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L1034
That is, suppose you have two trees Fossil and Rocks and both contain rank
species. And user's query was (Determination -> Taxon -> Species -> Name),
then rewrite this to contain two query fields:
(Determination -> Taxon -> Species (Fossil) -> Name)
(Determination -> Taxon -> Species (Rocks) -> Name)
So, each such field will get diverged into however many trees contain it.
If you have some kind of filter, things become more complicated (in the
worst case, you'd need to generate four fields instead of two -- the extra
two will be for OR null).
This approach is quite complicated to implement correctly. However, the
all the backend changes from method 1 will still work here (so those are
common). Maybe ask UX to see what is acceptable?
I can look over the changes in TreeRecord.py more deeply later (on
Tuesday/Thursday morning/evening). Querying a specific tree support on the
backend is needed regardless. Definitely see if you have a better idea. On
the frontend btw, it'll be probably be beneficial to add the treedef name
in the list in BatchEdit.tsx that asks you to select extra fields. Also, if
you feel things can be simplified in batch-edit in general (even for simple
fields PR), definitely do that.
—
Reply to this email directly, view it on GitHub
<#6196 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACTDB4RLNEIZZB2DRCWN5ZT2O33FPAVCNFSM6AAAAABWRIGB6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWGA4TOMZSHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Eh, just wanted to give Sharad more context about this - I recall how hard it was to fix bugs in xml-editor, and that's after we had videos from Max! If it makes you feel any better, I don't remember much beyond changes I did in batch-edit and nested-to-manys. |
huh that is weird. By the way, is the Deltodus duplicated? What does the query look like? |
Weird, turns out one of the Deltodus actually belongs to Species instead: https://calvertmarinemuseum20250206-production.test.specifysystems.org/specify/view/taxon/38726/ |
Ah that's fine (repeat of that taxon). Curious what the sql query itself looks like on this branch for that query not working. It should be log level debug |
@realVinayak Prod: https://pastebin.com/imefgzi0 Here's a diffcheck in case pastebin doesn't like me: https://www.diffchecker.com/PxUxsDFd/ Prod looks at the treedefid as well in the case whens. Looks like I missed something somewhere |
that's perfect thanks. I can look at that in an hour or so. I did see the treedefid discrepancy in the code, but that would not matter (so yes, the code from production is redundant). |
This is a wacky bug, and it also happens in issue-5413 (so batch edit too) but I think you already knew that. The diffs helped. Specifically, I noticed this: specify7/specifyweb/stored_queries/execution.py Lines 882 to 889 in 4814f11
The reason is because TreeRankQuery instances are dynamically constructed. The proper way to fix this would be use NamedTuples for fields and relationships. See #628. I guess a decent way could be to
EDIT: I jumped the gun on the explanation. The bug originates here:
The way ORs work is by constructing the same queryfieldspec but implicitly ORing them. Above, the fs.fieldspecs are all different in the cases where TreeRankQuery occurs in the join_path (because the class instances are dynamic). Not a problem for regular fields and relationships because they are created just once at the beginning and are then "static". |
NOTE:
|
@realVinayak Since the frontend queries are not to be changed and we want to add missing fields to the dataset directly, I think the second way you mentioned sounds appropriate. Can you clarify what you mean by adding two extra fields for Also, I tried hardcoding some things around to get a feel for what changes need to be made. When I make a change to a Taxon and commit, it gets highlighted as a new cell and a new node gets uploaded to the tree. The behavior is the same in #4929 before the prod merge (commit 2d86ba4). Is this expected? |
Sorry for the delay. Question 1
Note here that the OR is needed. Otherwise, you'd be constructing
which can never be true. So, I thought this could be fixed by making the below (which is what the message above meant)
But the last one is actually wrong because then the cases where both are null will also be added. In conclusion, the first variant is correct. Since only one of the OR conditions can be true at a time anyways, it's good. Question 2 And you bring it in batch-edit, like below and change the name (the yellow), it'll try cloning all the attributes of the previous "TestSpeciesOther", and try inserting like workbench. So, you can never actually do batch-edit on tree ranks. This might seem like there's then no need of supporting tree ranks, but, bulk changing determinations can be done this way (the changed node will be matched and created if not found). Does this make sense? I'd recommend asking grant if he has sometime to discuss this, I did demo this quite a bit before. BTW, here is the corresponding query (for the above screenshots) You can access the query at http://localhost:5050/specify/query/261/. Lmk if that link doesn't work /j. Misc Why do we care about the "highest" rank? Highest means from the bottom. Lower rankid, more higher in the tree. There is no harm in always showing all the ranks to the user. That is, every time a relationship to tree (with ranks) appears, we can show all the ranks for all the trees. This simplifies implementation quite a bit (all that findMissingRank stuff won't be needed). However, this is bad for usability because, you know, some have like 27 ranks + multiple trees can't make it better. So, we show the least ranks in least trees needed to get stuff done. For a single tree, it'll be all the ranks below the highest. But, wait -- do we always need that? If the user selects "Species" and "Genus", are there cases where we can show just those two ranks? Yes, in fact, if taxon is the base table, you can do that. But, if you are something like CO->det->taxon, we need to show all the ranks below the highest (genus). (however, what about below the lowest rank, so instead of everything below genus, what about everything below species? why would that not work? -- think about this! there are cases where this won't work) In the case of multiple tree, we can do the exact same thing, but for all the trees the selected ranks can be part of. In fact, we can only show just those trees that the user selected via ranks (rather than showing all trees). Need to be careful here, there can be more than one path to the exact same tree. Think about "CO->CollectingEvent->CEA->taxon" and "CO->det->taxon". misc again Batch-edit is two steps as you know:
I actually got the first part done for multiple trees yesterday (sunday) - I was curious about it. It does follow the requirements that were mentioned. I don't expect you to use it (that's why I have not made a PR on sp7 repo). However, if you're curious about it, here is the PR equivalent on my fork of sp7 repo (comparing my changes to this branch): https://github.com/realVinayak/specify7/pull/1. The main function you'd be tracing is this. The above screenshots are actually from that branch. You might recognize the different tree names. Here are some brief feature notes: Good things
Bad things:
Reflections |
@realVinayak Thank you! Your approach is a lot cleaner than what I was trying to do. I'm gonna use your code if that's okay. I'm not sure TreeRankRecord is needed for the upload plan. It seems upload works the same as long as the key follows the format |
Yeah that's fine - it's open source and public. Still, if you feel you can simplify logic more, please do so!
Yeah, I ended up making a conservative assumption that I need to use TreeRankRecord. I think the apply_batch_edit_pack for TreeRecord would need some adjustement (since, internally, the batch_edit.py uses the key of format EDIT: I just looked at the treerecord.py file. The changes necessary (from what it seems like) should be here (and not in apply_batch_edit_pack, spoke too fast). specify7/specifyweb/workbench/upload/treerecord.py Lines 1077 to 1081 in aa81431
Specifically, just format the key to be something like
I haven't tested above but that's the idea. Still need to trace out later to see if there are any gotchas from the prod code changes. |
@emenslin Should be fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test #6011
- Verify all tree fields can be queried with
Test batch edit with a single tree in db
- Edit a single node
- Test validate/commit
- Verify edited cells were highlighted as Updated Cell
- Test rollback
- Edit a tree path
- Test validate/commit/rollback
- Verify changes are highlighted as New Cell and the tree path is a new upload to the Tree
- Test missing ranks
- Verify there is a missing rank dialog indicating the lower ranks will be added
- Verify missing ranks were added correctly
Test batch edit with multiple trees in the db
- Repeat steps for editing single node
- Repeat steps for editing a tree path
- Repeat steps for missing ranks
- Verify missing ranks dialog allows you to select trees to edit with
- Verify ranks of selected trees are added correctly
- Verify ALL trees are used when no trees are selected in the dialog
Looks good!
Notes
After completing testing on the Test Panel, I was doing a quick gen test and BE validation stopped working on calvertmarinemuseum_batch_edit_2025_02_10. Looks to be the same as #6431.
Working locally so I will approve this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test #6011
- Create a query with a Tree base table (eg: Taxon, Storage, Geography, Lithostrat, Chronostrat, Tectonic Unit)
- Verify all tree fields can be queried with
Test batch edit with a single tree in db
- Edit a single node
- Make a tree query with
(any rank)
and some other simple fields - Click Batch Edit
- Edit fields
- Test validate/commit
- Verify edited cells were highlighted as Updated Cell
- Test rollback
- Edit a tree path
- Make a tree query with multiple rank names
- Batch edit
- Edit fields
- Test validate/commit/rollback
- Verify changes are highlighted as New Cell and the tree path is a new upload to the Tree
- Test missing ranks
- Make a tree query with a rank that isn't the lowest rank
- Click batch edit
- Verify there is a missing rank dialog indicating the lower ranks will be added
- Click continue
- Verify missing ranks were added correctly
Test batch edit with multiple trees in the db
- Repeat steps for editing single node
- Repeat steps for editing a tree path
- Repeat steps for missing ranks
- Verify missing ranks dialog allows you to select trees to edit with
- Verify ranks of selected trees are added correctly
- Verify ALL trees are used when no trees are selected in the dialog
Looks good, all trees can be batch edited. However, I couldn't roll back any of the data sets. Maybe due to Batch Edit: Rollback can only re-add data that was present in the original dataset #6427? Here is the error mesage:

Here is a data set that won't roll back: https://ciscollectionsbatchedit220250210-issue-6127.test.specifysystems.org/specify/workbench/28 |
@lexiclevenger Can you try again? Should be fixed now |
Created a different issue for cases where rollback fails due to a disambiguation error: #6443 Datasets without a disambiguation error should have a working rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test #6011
- Verify all tree fields can be queried with
Test batch edit with a single tree in db
- Edit a single node
- Test validate/commit
- Verify edited cells were highlighted as Updated Cell
- Test rollback
- Edit a tree path
- Test validate/commit/rollback
- Verify changes are highlighted as New Cell and the tree path is a new upload to the Tree
- Test missing ranks
- Verify there is a missing rank dialog indicating the lower ranks will be added
- Verify missing ranks were added correctly
Test batch edit with multiple trees in the db
- Repeat steps for editing single node
- Repeat steps for editing a tree path
- Repeat steps for missing ranks
- Verify missing ranks dialog allows you to select trees to edit with
- Verify ranks of selected trees are added correctly
- Verify ALL trees are used when no trees are selected in the dialog
Missing ranks are added in a random order, is this supposed to happen? This is an example with a single taxon tree but it also happens with multiple trees.
04-25_13.55.mp4
Also I did run into this api rank problem causing unload protect a couple times, it seems to only happen when batch editing on 'any rank' but I'm not sure if it happens anywhere else.
04-25_14.35.mp4
@emenslin Actually, I had to revert the change. Trying to fix column ordering breaks tree datasets in relationships. I think we'll have to push it to a different milestone. Can you create a issue for it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test batch edit with a single tree in db
- Edit a single node
- Test validate/commit
- Verify edited cells were highlighted as Updated Cell
- Test rollback
- Edit a tree path
- Test validate/commit/rollback
- Verify changes are highlighted as New Cell and the tree path is a new upload to the Tree
- Test missing ranks
- Verify there is a missing rank dialog indicating the lower ranks will be added
- Verify missing ranks were added correctly
Test batch edit with multiple trees in the db
- Repeat steps for editing single node
- Repeat steps for editing a tree path
- Repeat steps for missing ranks
- Verify missing ranks dialog allows you to select trees to edit with
- Verify ranks of selected trees are added correctly
- Verify ALL trees are used when no trees are selected in the dialog
Looks good! I didn't run into any issues and was able to roll back all of the data sets as long as they didn't have disambiguation errors.
Fixes #6127
Fixes #6011
Fixes #6315
Warning
This PR does not directly contain a migration but since it's based on #5417, make sure to use a dataset where #5417 was tested.
This PR enables batch editing on Tree tables and enables support for querying on all tree fields when choosing a specific rank (no longer limited to Author and Full Name). Functionally, this PR will work the same as the workbench when uploading to a tree rank.
Batch editing with tree as base table
name
field is required when querying on a tree rankany rank
), Batch Edit expects to have all ranks lower than the chosen rank in the queryTree 1
andTree 2
:Tree 1 -> Species
andTree 1 -> Subspecies
Tree 2 -> Species
(lowest rank)Tree 1 -> Species -> name
Tree 1 -> Species -> name
Tree 1 -> Subspecies -> name
Tree 2 -> Species -> name
Batch editing a Tree path
Editing a single node
(any rank)
query fieldChecklist
self-explanatory (or properly documented)
Testing instructions
Warning
NOTE: This PR will have the same capabilities as in the workbench when uploading to a Tree table if you edit the name field (i.e: Edited records will not be considered 'updated' cells but rather be a new upload to the tree).
However, other simple fields of the Tree can still be edited and will highlight as updated cell.
Test #6011
Test batch edit with a single tree in db
(any rank)
and some other simple fieldsTest batch edit with multiple trees in the db