Sharing 1.5 - Single Whitelist Persistence #351
Replies: 1 comment
-
Thanks for exploring this @jbee . I'm adding my comments below:
I am not sure this is correct - I would assume a simple join between a user and it's user groups would be enough to simplify the query if that is the case. Regardless, I do see the point in the complexity overall.
A couple of comments: Regardless of the previous comment, now we end up with data duplication. For each level of access a user or usergroup has, it will now have either 1 (r) 2 (rw/w) 3 (dr) or 4 (dw) representations, one for each type of access - Provided we assume a level of access implies the lower levels of access. An alternative solution here, could be that each UID is just present in it's highest level of access, and the query to check actually check all 4 columns, exiting at the first hit. In regards to both comments above, storing all the access as jsonb in the form of tuples ({uid: , access: }) I think would allow us to check the presence of any uids, without regard to access, while at the same time get all access of a specific type, and avoid duplicating uids. In this case I do think that we would need to use a format of the access to be able to denote all levels in one string, like we have today. The hardest problem I foresee, is how we deal with updates - When a user group is deleted, when a user is deleted, etc. Would this be something we could easily switch into today, or do we need a new process for handling this? Finally, how do we represent the special access "owner"? Do we keep it at the top level like today, or will this also be moved into the jsonb?
No, we cannot do that. Being able to write data, does not imply you can change the metadata. And reading metadata definitively does not mean you can read data. This is a much bigger concern in tracker though, since it's individual level data. There are also convoluted sharing setups in tracker to allow partial access to data (Like a hospital being able to create an event for a lab test, and a lab being able to fill out the event without accessing the patient information; Campaigns also have a convoluted setup, but I cant as easily describe that)
I'm not sure if I follow, I assume you mean in your example if it's contained in either of the columns, access is allowed (In the case of checking READ access and UID is in either READ or WRITE column?)
How would this look in extreme cases? I assume you might easily hit the max length supported for a query if you add a significant list of usergroup uids. This also does not handle the case of owner or public, so you would still be left with 2 more conditions?
Thinking about the sharing dialogue in the UI: When I want to show the user who currently has what access, how do I get that information? With this setup of mixed UIDs, there is no way to identify which is a user and which is a usergroup, without manually checking each one?
That is fine, but again, mixing different types of values smells like trouble. The alternative I suggested earlier might work fine, both by adding a new field "type" which can be "user", "group" or "public"(Or something else)
I agree, and without testing, I would expect it to be pretty good for most of our usecases, where we know either uid or both uid and access. I'm not sure if that requires 1 or 2 indexes though. I am not entirely sure how jsonb gets indexed. The r{uid} idea thought means we would need string operations to extract the uid, and string operations are incredibly expensive in postgres.
This is definetively a problem we want to address, but I am not sure this is the right side to start from, I think we need to look into the metadata filtering first. Might be a good idea to identify the requirements for performance there first, before we decide on an approach here.
Data sharing( data read or data write ) both imply metadata read access; IE, they need to see the metadata/know of it's existence to be able to write to it. It data read or data write does not mean they can change the metadata Overall, I like the idea, and I think we should continue exploring, but I think there's a few things about the proposed solution we need to figure out first! |
Beta Was this translation helpful? Give feedback.
-
Motivation
The current sharing system is multi-dimensional making it unnecessary hard to understand and work with.
This impacts both complexity and performance.
However, for practical reasons it is hard to switch to a fundamentally different system.
Ideally the user experience and interaction with sharing should not change to avoid consts of transitioning.
Practical Example
With the current
Sharing
model there are multiple fields to check.Two are maps of UID => access pattern.
A SQL to evaluate this has multiple parts because there are multiple fields to check.
For the maps with each group a user is a member of the query gets another expression making the SQL long, complex and thus it is fair to assume costly in terms of performance.
A filter looks something like this
Proposal
This proposal is a solution that only changes how the sharing information is stored and processed but it should be possible to reconstruct the current API layout from the structure.
The sharing is a whitelist of UIDs and special tokens for read and write.
Empty sets are omitted.
In Java
The proposal is also to drop the data vs metadata distinction and to always imply both.
If this should be maintained more sets could be used or UIDs can be extended with a prefix character (see tokens).
Lookup
For a lookup it is always known upfront which of the 2/4 lists to check.
So the check itself is always a check for set intersection.
The users set of associated UIDs and tokens is checked against the whitelist of sharing.
If there is at least 1 contained in both access is allowed.
This way the SQL needed to perform sharing checks is a single X in Y where X and Y are JSON arrays of string.
For an in memory check this equally is a
Set<String>
containsAny check.A filter would always look like this
Tokens
In the whitelists sets UIDs of users and user groups would be mixed.
The set would also allow non-UID tokens with special meaning. For example, a token to allow public access, instead of putting an UID in the set
"p"
could be added to symbolize that public access is allowed.UIDs could also be prefixed with a token to mark them as read/write/dataread/datawrite instead of having multiple lists. E.g.
r{uid}
to give metadata read access to the UID.Performance
From a performance standpoint it makes most sense to use an actual JSON array as the sole structure as that can be indexed in postgres AFAIK.
In such a form it is clear that tokens need to be used, e.g.
r{uid}
(user/group can read) andw{uid}
(user/group can write) etceteraJSON
Open Issues
The issues are not related to the representation but to the sharing concept itself. These are good to keep in mind when changing the design to maybe solve some of them in the process.
fields=ref[a,b,c]
to see a,b,c of the reference from another root.Beta Was this translation helpful? Give feedback.
All reactions