[FOLIO-535] agree on the approach for joining/filtering (single module) associations Created: 03/Apr/17 Updated: 15/Jul/20 |
|
| Status: | Open |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Umbrella | Priority: | P3 |
| Reporter: | Jakub Skoczen | Assignee: | Jakub Skoczen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | for-dublin | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | 4 hours | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
We have discussed that the current approach to mattypes and PGs is not consistent and confusing: mattypes use a client-controlled "code", while PGs use a server-side enforced UUIDs to establish an association with Item and User records respectively. The client-controlled "code" especially is considered among the front-enders as problematic and not friendly for users/UX. We would also want a consistent approach for "controlled vocabulary"-like (CV) entities (mattypes, pgroups, and soon loan_types, locations, etc) The two proposed solution (it actually could be combined into a single-solution): a) always generate a UUID for the CV item and use that to link the item to the entity. Do not attempt to "lookup" the display name when retrieving the entity or when searching/filtering. Use the raw UUID and perform a display-time look-up in the UI by retrieving the ID-to-displayname map (we can ensure that this map will be fairly short for the considered item types) |
| Comments |
| Comment by shale99 [ 03/Apr/17 ] |
|
i think that using the UUID may be abit confusing will mention a few things that i thought about - comments welcomed... (the below isnt a design suggestion, just how it is done in other systems, there is logic in the process below) lets think about how 99% of the material types are populated, lets assume an import of marc records as this will be a large chunk of many tenants' data example: book | {'marcfield':'a', 'marcvalue':'b'}| etc... <--- when value b appears in field a in the marc then this is a book code (by the way this isnt far from what it looks like) 3. transforming the marc records into folio items will include the step that maps the material type to each folio item 4. at this point we have a folio record which has either 'book' in the MT or 1234567890 so at this point there are a few advantages to having book 1. as mentioned, there is no need for any lookups , i18n can be applied immediately i think for an admin needing to manage this - mapping UUIDs to values will be something that will be just way too complex - also looking at these files will be very hard to understand. |
| Comment by shale99 [ 04/Apr/17 ] |
|
I also believe that folio will come out of the box with probably 10-15 material types out of the box which will probably cover the bulk of all items (as part of the tenant provisioning process) - and each tenant will probably only need to add some more unique MTs that are needed by their institution. So i think most of these codes wont be created by end users, i think most will be packaged into folio and library admins will add / customize that. This will also allow us to package with folio a basic i18n in the decided upon languages, which contains default translations for those codes out of the box. |
| Comment by Jakub Skoczen [ 04/Apr/17 ] |
|
shale99 You are throwing a lot of things into one basket, let me comment on some:
|
| Comment by shale99 [ 04/Apr/17 ] |
|
first a general comment: i18n - if i tell you - what does this mean getLabeli18n("de", "materialTypes", "123456") and then ask you - what does this mean: getLabeli18n("de", "materialTypes", "book") without looking at anything else - which one can you give me an answer to and which one do you have to start looking into code / db to find out? A UI interface to i18n - will be a UI screen where you assign labels in different languages to codes. this will exist in folio at one point we should accept this as a given. the days where a librarian opens a file and starts editing, well, we dont want to go back there. so how would you create this interface if all you have are numbers to map to? |
| Comment by Niels Erik Nielsen [ 04/Apr/17 ] |
|
If there's a unique constraint on the material type label, you can have UUID for primary key and still implement the desired interface - using descriptive identifiers to retrieve one and only one row – if you want/need. |
| Comment by shale99 [ 04/Apr/17 ] |
|
Hi NE, do you mean to maintain a uuid and a unique (index) descriptive identifier - instead of just the descriptive identifier? just want to make sure i understood? |
| Comment by Niels Erik Nielsen [ 04/Apr/17 ] |
|
Yes. |
| Comment by shale99 [ 04/Apr/17 ] |
|
right, I agree, there should be a primary key with a uuid and that can be used when needed. and i agree there should be a unique descriptive code as well. my view is that the descriptive code should be put in the item and not the uuid |
| Comment by shale99 [ 05/Apr/17 ] |
|
guys, whichever way we chose , lets decide - i have a big change that i am either scrapping or checking in which changed the material type api to work with the descriptive codes instead of the uuid |
| Comment by Niels Erik Nielsen [ 05/Apr/17 ] |
|
Right. If I understand you correctly, that means using the descriptive code as foreign key in Item. I get the convenience of doing so, but this is where you lock the label in. It can no longer be modified just like that and then we have people discussing the options: a) Yes, the label/descriptive code can be modified, but only if it's not referenced by any other modules yet. We need to know which modules might reference it and inquire if they have any actual references to it and reject a modification if any of them do b) Sure, the label/descriptive code can be modified; but we need to know which modules reference it and ask that they all update their references to the new label as we modify it in the base table. c) No, we cannot know which modules might reference a given label and we cannot safely ask them all to update their references in one big transaction. Modifying labels after the fact is not allowed. While c) is certainly the simplest and safest option, it also provides for a unforgiving, "gotcha" user experience, where any typo or inconsistency between users in applying labels gets frozen in time. If we use that approach for foreign keys, we also make it legitimate for every module writer to do the same and I'm pretty sure we'll regret that. |
| Comment by shale99 [ 05/Apr/17 ] |
|
i dont believe that indicates a problem with the inventory module itself putting the code into the items as the inventory module controls its own data, so i dont believe there is a cross module issue here, correct? if this is the workflow - i know that this is done this way in other systems, then it seems that these codes will really be immutable - if this isnt the case then your point for changing the api hold water - but they dont necessarily indicate that placing the code in the items is problematic. |
| Comment by shale99 [ 05/Apr/17 ] |
|
moreover, the cross module issue will actually always be an issue - lets take the DELETE scenario. if i want to delete a material type - can i do it? i can never really know what other modules are using - will i break a module if i do? isnt this a micro service issue in general? |
| Comment by Jakub Skoczen [ 05/Apr/17 ] |
|
Guys, I am still catching up with the previous comments but for the sake of expediency, if the decision is needed now let's go with UUIDs. So any kind of "controlled vocabulary" endpoint is similar and would look like: GET|POST /items/:uuid GET|PUT|DELETE /items/:uuid In the "parent" record (like User) we would include raw UUIDs of the associated record like so:
{
"username": "shale99",
"patron_groups": [""123-456-789-123", "345-678-910-111"]
}
Searching and filtering would be done using UUIDs and the UI would need to lookup the Item objects for labels etc. Now, the open question is whether we include a seperate endpoint for controlling associations, directly on the vocabulary item like so POST|GET /items/:uuid/users/ DELETE /items/:uuid/users/:uuid It probably makes sense. In some case we may even skip the step of "joining" items to the parent object, especially if we don't need searching/filtering functionality on the item. |
| Comment by shale99 [ 05/Apr/17 ] |
|
if we want this for material types as well, please let me know and i will make the change |
| Comment by Jakub Skoczen [ 05/Apr/17 ] |
|
Yes, I am aware of that and I am proposing we align mattype with this. I know that you are coming from the perspective where mat types are shared among different tenants/folio installation and then codes might make sense but so far there's no information to indicate that this would actually happen. So they are just pain in the ass. |
| Comment by shale99 [ 05/Apr/17 ] |
|
that isnt at all where i am coming from - i am coming from understanding the biz module and usage of this in a real system - but i will make the change |
| Comment by Mike Taylor [ 05/Apr/17 ] |
|
Please hold off on a decision for another hour. I need to read what's been said before I comment. |
| Comment by Jakub Skoczen [ 05/Apr/17 ] |
|
Mike Taylor Ok, but I would like to avoid prolonging discussion about this much more. I've taken both shale's and NE's arguments into account and UUIDs are the most flexible option. |
| Comment by Mike Taylor [ 05/Apr/17 ] |
|
Well, there are a couple of things going on here. We all agree that for each of our "controlled vocab" record types – patronGroups, materialTypes, permissionSets, etc. – we need a permanent, unchanging unique ID; and that we also need a displayName. I think we all also agree that So there is a real argument for having IDs that are somewhat readable ("book", "dvd" rather than "0bfb4036-69f6-4b98-b087-7a8776a1218e", "4cc1d7ea-e5f1-471c-80c5-fa7a552ea31b"). I would be sad if the CQL queries that the UI sends to Okapi changed from materialType="Book"
to materialType="0bfb4036-69f6-4b98-b087-7a8776a1218e"
(I also lament that individual user views in the Items app have URLs like http://localhost:3000/users/view/B076F7CB-9043-466A-853D-8F463B0E8C31/jacklyn
rather than http://localhost:3000/users/view/jacklyn
but let that pass for now.) So can we use nicer unique-IDs like "book" and "dvd"? The arguments for not doing this seem to be: Did I miss any? If not, then these all seem pretty weak to me: Putting it all together, I am not convinced the arguments for the nuclear option of UUIDs are strong enough to merit the nastiness they entail. I think we can be more ambitious. |
| Comment by Matt Connolly [ 05/Apr/17 ] |
|
Mike has expressed much more eloquently what I was contemplating myself. I don't see a problem with auto-generating unique identifiers that are at least close to the initial desired display label. |
| Comment by Jakub Skoczen [ 05/Apr/17 ] |
|
Mike Taylor I think auto-generating IDs that are close to the original labels gives you a false sense that the values are something more than opaque identifiers. When you do this, they will eventually become an integral part of data model (and interfaces) as people will consider them to be representative of the objects they identify. And then we face all the issues NE mentioned – what it the data they represent change? Do we re-create the identifiers and scan the entire system for references to update them? Do we leave them untouched and risk that people relying on their apparent meaning keep on using them unaware that the real, underlying meaning has changed? No, I don't think this is an acceptable solution. I think it's much more honest to say – we are using UUIDs precisely because we want to convey that the ID values have no meaning. The meaning is within objects, and the meaning might change. It's the duty of whoever consumes the data to keep the meaning in sync. Now, to specifically address the query/display problem: as I said in the description for pure convenience we can add the ability to query/display using the values behind the ID by performing a join on ther server during retrieval, especially if the data resides within the same module. In the PG example specifically the returned object would include e.g:
{
...
"patron_groups": [{"id": "123-456-789", "text":"Students" }]
}
and CQL may include a special index or modifier:
patron_groups/text = "Students"
But the "maintenance" of the values (edits, associations with other object) should be done using opaque identifiers. And that's the crucial piece. |
| Comment by shale99 [ 05/Apr/17 ] |
|
guys, from a technical standpoint - remember that this will require a join, unless we say that api and ui users alike need to make multiple calls to retrieve all item info. but if we are going with ids it is not a problem i will tweak the implementation, it is already mostly id based. |
| Comment by Mike Taylor [ 05/Apr/17 ] |
I agree 100% that this is a real danger. The solution is in the developer culture and documentation. We absolutely must take steps to prevent people from interpreting them wrongly.
The second one, for sure. The first is horribly error-prone.
I am sympathetic to that (though even then I would prefer small integers. Librarians remember that the 245 field means title-statement; they would never remember that 0bfb4036-69f6-4b98-b087-7a8776a1218e means title-statement.) Still, we need to give some thought to what the queries, the URLs and the internal objects look like. They don't matter to the UI, but they matter enormously to the developer experience – and for several years yet, that is going to be much more important than the user experience. |
| Comment by Jakub Skoczen [ 05/Apr/17 ] |
|
Ok, I think I presented above how can we increase developer experience when it comes to querying/displaying data. For now I would like shale99 to go ahead and perform changes to no longer rely on hand-code "codes" and but UUIDs. Thank you. |
| Comment by Marc Johnson [ 05/Apr/17 ] |
|
Will the change to make material types behave this way be a separate issue, or associated with this? |
| Comment by Jakub Skoczen [ 05/Apr/17 ] |
|
I think a seperate one and one that is aligned with any required UI change that potentially needs to be performed by Matt. |
| Comment by Niels Erik Nielsen [ 05/Apr/17 ] |
|
It seems to be a concern that UUID's are visible in URL's, back-end requests and log files. If we find this all that important I'd expect that we can avoid it a lot of the time when we have alternate keys, like the material type label. This can be demonstrated with the UI for Users, which also has an alternate key, besides the UUID, the "username". We currently have the patrons UUID in the URL: With these changes to Users and UserView : (don't mind the double name in the URL, it's due a previous identifier confusion requiring us to have both ID and name on the route; something that can obviously be just one unique identifier) With a bit careful design I'm pretty sure we can use alternate keys (if available) in many instances, instead of the UUID, if visible UUIDs is a major concern. |
| Comment by Jeremy Huff [ 05/Apr/17 ] |
|
Also I think that users are used to ugly URLs: |
| Comment by Mike Taylor [ 05/Apr/17 ] |
|
Niels Erik Nielsen, we should definitely make that change in Users – it's bothered me for a long time that our URLs include both the ugly-ass ID and the human-readable one. (By the way, usernames are a nice example of an ID that is created by humans to be human-readable, but which is nevertheless unique and usable as a key.) |
| Comment by Niels Erik Nielsen [ 06/Apr/17 ] |
|
Right. As long as we realize that the UUID is still the more robust choice for linking. When we create links on data items subject to change, like the username is, then those links would obviously break if a modification happened mid-air. Bookmarks would break too of course. |
| Comment by shale99 [ 06/Apr/17 ] |
|
guy, i am moving forward with the uuid implementation |
| Comment by Niels Erik Nielsen [ 06/Apr/17 ] |
|
Change of username is what's currently implemented at least, according to
Which made me wonder ... so I went to the public demo and changed the username of roma to roma2. If you look up roma2 on folio-uidemo.aws.indexdata.com now you should get: Click the alert away, you should see that Pamela Price still has her loans, but apparently no permissions anymore. This is what using UUID's everywhere is supposed to safeguard against. |
| Comment by shale99 [ 06/Apr/17 ] |
|
if a user name is update-able , i don't think it is a candidate for the use case we are discussing. |
| Comment by Mike Taylor [ 06/Apr/17 ] |
|
I certainly agree that fields which can change are not suitable to be used as IDs. (This should hardly need saying.) But it does seem bizarre to me that we want to support changing username! |
| Comment by Niels Erik Nielsen [ 06/Apr/17 ] |
|
Well, we want to be able to merge two libraries' user bases, or a library might want to use email address for username. It's a feature anyway, and you run a risk by making such assumptions, constraining yourself upfront with poor data design (saying "poor" while always still acknowledging the challenges of doing proper data design in a micro-services context). |
| Comment by Mike Taylor [ 06/Apr/17 ] |
|
Merging two libraries sets of users is a horrifying scenario. But, yes, an important one. I don't know what the right answer is when we merge MIT and Yale, and both of them have a user called mike. Ze'ev, do you have any experience of this? At any rate, one thing is clear: if we do go with somewhat-human-readable identifiers, we need to be really clear about which such fields are identifiers, and thus immutable; and which are not (and so can be changed). ... All of which I admit is an argument for using UUIDs, which no-one would ever mistake for readable. |
| Comment by shale99 [ 06/Apr/17 ] |
|
i havent heard of a scenario where two tenants are merged into one - there is always a tenant namespace (in our case db schema) , unless i missed the merging scenario |
| Comment by Niels Erik Nielsen [ 06/Apr/17 ] |
|
If modules only ever referenced mike by UUID, then the respective diku_admins could go and rename their patrons mikea and mikeb in mod-users/mod-auth and that would be it. If any modules referenced those users by "mike" (like permissions does?), then they would have to go find all those references and update them at the same time as the base user table is updated. |
| Comment by Mike Taylor [ 06/Apr/17 ] |
|
I would go further: if username is used as a foreign key anywhere in any schema, then it has to be immutable. (And except in the merge-two-libraries scenario, that seems to me like an eminently reasonable constraint.) |