-
Notifications
You must be signed in to change notification settings - Fork 16
The visit merge problem
What should AQTS do when you have a FlowTracker2 measurement taken on Wednesday and a CSV of some extra parameter readings taken at that location?
You have two data files and want a single Wednesday visit that contains activities from both files.
At a high level, AQTS should handle the complexity of merging the two bits of data together.
Merging the individual visit activities together is simple in concept, and even simple for many activities, but it gets rather complex with discharge activities, and those measurements are the AQTS bread-and-butter. Implementing discharge-merge wrong, or poorly, or by making up some data is not going to retain our customers' trust, so implementing a solution to this problem has been delayed until we can figure out an adaptable-enough solution that will meet all of customer needs, both for big and small organizations.
As of 2022-July, this problem remains open.
Here is an outline of what the merge logic might be when trying merge existing visit V1 and new visit V2 (on the same day and for the same location), along with any expected difficulties.
- Extend the visit start and end times to contain all of V1 and V2.
- Perform a boolean OR of both V1.CompletedActivities and V2.CompletedActivities
- The Party, CollectionAgency fields should be merged with the distinct values from V1 and V2.
- The Weather and Comments fields are multiline. Merge any unique lines of text from V1 and V2.
- These activities are just simple lists, so just combine all the activity lists from V1 and V2 together.
- These activities are also simple lists, so just combine all the activity lists from V1 and V2 together.
- Just combine all the visit attachments together
Both Control Condition and Gage Height at Zero Flow activities are limited to one-per-visit.
So if V1 has a Control Condition measured but V2 doesn't, then the obvious result should be to use the activity from V1. Similarly, if only V2 has a Control Condition, then that's what should be retained.
The difficulty arrives when both V1 and V2 have a only-one-per-visit activity.
- Should merge fail if both V1 and V2 have a control condition?
- Should it fail even if all the values in both control conditions are the same? Maybe just silently keeping V1 is fine?
- Should a "first wins" approach be used? Retain an existing control condition from V1 and ignore everything in V2? Or log any ignored V2 values in the comments?
A discharge activity is a discharge summary (all the values documented here including the gage height measurements, quality metadata, and any adjustments) PLUS one or more channel measurements of the 5 specific measurement types (PointVelocity, ADCP, Volumetric, Flume, or Other).
Merging together discharge activities is difficult because of a few factors:
- A visit can have any number of discharges, and each discharge can have 1 or more channel measurements (even though a single channel measurement per discharge is the rule for 99.9% of all discharge measurements).
- Often customers will have two different files of the same measurement, with different levels of detail.
- a summary CSV that says Stage=12.5 m and Discharge=100 m^3/s at noon
- a detailed FlowTracker2 measurements showing the entire point-velocity cloud for all 30 segments that yield the 100 m^3/s discharge at noon
- often the timestamps within each file are close, but not exact matches.
This two-files-same-measurement-at-different-detail-levels is the most common scenario that our customers ask about. Ideally we want to merge the two measurements into a single discharge measurement, and we don't have a great way to reliably distinguish between a detailed measurement that only supplements the summary info (but uses different times) rather than being a separate measurement that the hydrologist didn't like and quickly redid.
In other words, the FlowTracker2 file that we are trying to merge was a hiccup, and the hydrologist is about to upload a second FlowTracker2 measurement that is the real measurement that yields 100 m^3/s.
Real world field visit data is quite messy, and we haven't quite got a decent story for intelligently combing values from the same measurement and ignoring the noise from other measurements.
That is the hard problem preventing visit merge from being automatic without human intervention.