[FOLIO-611] Document general-purpose aggregation in business-modules Created: 19/May/17 Updated: 15/Jan/19 |
|
| Status: | In Progress |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P3 |
| Reporter: | Mike Taylor | Assignee: | Jakub Skoczen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | core, doc, dublin, sprint15, sprint18, sprint19 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | 1 hour | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||||||||||||||
| Description |
|
As discussed in a break-out session on Dublin day 2. |
| Comments |
| Comment by Mike Taylor [ 19/May/17 ] |
|
As far as I remember, we didn't take permanent notes on any of this. I remember what was said (at least, I think I do), so I will briefly write it up. Then we can decide who will actually do it |
| Comment by Mike Taylor [ 19/May/17 ] |
|
Problem statement Much of what we have talked about as being the work of the business-logic modules is not really business-login, in the sense that it involves no knowledge of the business. An example of actual business knowledge would be knowing that when you create a loan record for item foo to user bar, you also need to set the status of item foo to "On Loan". But the generic functionality that we have needed repeatedly is trivial joining of multiple low-level records to a high-level one. For example, when we fetch user baz, the low-level user record that mod-users retrieves contains a patronGroupId. But we want the record returned to the client to include the whole patron-group record corresponding to that ID (not least so we can see the patron-group's human-readable name, to show to the user). But more than this: we also need to attach multiple instances of other kinds of objects. For example, the user typically has multiple loans. There is no loanId in the low-level user record; a separately addressable loan object expresses the loan of an item to a user. So we need to check the loans API for all of user baz's loans, and attach an object for each of these. So in general, we need the ability to join both sub-objects that are directly linked from the main object and those that are associated with it by a link table. |
| Comment by Mike Taylor [ 19/May/17 ] |
|
How to express the request We need to let the client specify which of the many possible joined objects to include in the records that it gets – for example, we might want a user's patron-group but not want to bother fetching the loans. The JSON-API people have a generic way of expressing requests, that we might adopt. See http://jsonapi.org/format/#fetching-includes – briefly, you ask for /users?include=patronGroups,loans JSON-API also specifies a record format to be returned, containing these associated objects. I worry that this is overly pedantic and verbose. |
| Comment by Mike Taylor [ 19/May/17 ] |
|
Permissions There is – or we thought there was – an issue with what included information the logged-in user has access to. For example, I may be logged in as a user who has permission to see patron-groups, but not to see user-permissions. We ran out of time to think this through, but as it's turned over in my mind since then, I suspect there is actually no problem: the back-end should simply include whichever of the included subrecords there is permission for, but omit those that the user is not permitted to get. The aggregate response would come back as a 200 OK provided that the main object could be fetched and provided that the only kind of error fetching the sub-objects is permission denied. |
| Comment by Mike Taylor [ 19/May/17 ] |
|
I think I have now captured everything we discussed – but please, people, shout if I missed anything. |
| Comment by Jason Skomorowski [ 19/May/17 ] |
|
(The overly pedantic and verbose format for multiple-record-responses ie. compound documents: http://jsonapi.org/format/#document-resource-identifier-objects ) |
| Comment by Kurt Nordstrom [ 19/May/17 ] |
|
Didn't we also have the issue of how to express schemas (for RAML) for the business logic modules? That is, how to build a new schema using references to existing ones, without having conflicts? |
| Comment by Mike Taylor [ 19/May/17 ] |
|
Ah yes, so we did! (My brain discarded that bit, because it's back-end only |
| Comment by Marc Johnson [ 22/May/17 ] |
|
The issue for covering sharing of schema and interface definitions is
Hugs |
| Comment by shale99 [ 22/May/17 ] |
|
have been experimenting with the "oneOf" validation on the rmb layer will need coding to support this |
| Comment by shale99 [ 23/May/17 ] |
|
an update on oneOf. a usage example may look like this: "patronGroup": { "type": "object", "oneOf": [ { "$schema": "http://json-schema.org/draft-04/schema#", "title": "Id Reference", "type": "object", "properties": { "id": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$" } } }, { "type": "object", "$ref": "usergroup.json" } ], "additionalProperties": false } where the id schema should be more of a public schema and referenced by any implementation. the current problem is that jsonschema2pojo which is used to generate the pojos in rmb and also creates the validation annotations in the pojos used by the rmb validator does not support this (does not inject annotations / creates an empty pojo) - there is an open issue on this. so if we want to go this route it would mean |
| Comment by Jakub Skoczen [ 30/May/17 ] |
|
Breakout about this planned for Thursday 01.06 at 15:00 |
| Comment by Kurt Nordstrom [ 01/Jun/17 ] |
|
Ok, so here's how I see things going down:
|
| Comment by Mike Taylor [ 02/Jun/17 ] |
|
Consider users-bl trying to implement this query:
We can implement that as: const mt = MaterialTypes.findbyName('book'); return Users.find(`username=a* and materialTypeId=${mt}`); That is easy. But now consider a query that looks very similar:
There is no analogous implementation. Instead we have to do something much more complex: const users1 = Users.find(`username=a*`); const users2 = UserPermissions.find(`permission=items.read`); return someHowCombineResults(users1, users2); That requires that we search in two separate modules, and combine the results. |
| Comment by shale99 [ 02/Jun/17 ] |
|
I will admit that my preferences are as follows: Mike Taylor - what field would contain data that can be used to join records from the permission and user responses (which field do they share that can be used to match and combine)? in any case - large result sets returned from each service break this implementation. i have an unpublished feature locally which allows a module to request content from a module (for caching) - and ask to be notified when that data is updated for that request (by indicating a url to call) - this would probably be better suited for a producer / consumer messaging framework - but if we are trying to avoid an indexing layer for this - then this may work for medium (1000's - 10,000's) sized result sets |
| Comment by Mike Taylor [ 02/Jun/17 ] |
User-ID. In the Users records, it's just `id`, and in the User-Permissions records, it's ... currently missing.
My point exactly. |
| Comment by shale99 [ 02/Jun/17 ] |
|
one more issue i guess while on the topic - looking at the loans - if we really only need two fields out of lets say 30 - we are holding the course and the standard is to bring in the entire record - if in the future an optimization is needed, we will handle on a per case basis |
| Comment by Marc Johnson [ 02/Jun/17 ] |
|
When I referred to the loan schema in the circulation module, it was more as a possible example of the idea that Adam Dickmeiss referred to by having a separate id and de-referenced representation properties in the schema and not as an exemplar of all aspects we talked about yesterday. shale99 I agree that one of the considerations we need to keep in mind when sharing schema is that they will be used in different contexts. For example, when being included for de-referenced representations we might find more properties are included than necessary, or when sharing between storage and business logic modules, the de-referenced representation might never be provided by the storage module. I agree that there will likely be limitations to any current implementation that might need further work in the future. Does the proposed interface allow us to describe the questions and responses we want in a way which means we can incrementally improve the implementation? |
| Comment by Kurt Nordstrom [ 02/Jun/17 ] |
|
To join users and user-permissions associations, you would use the "username" field. I suppose we could switch things so that the user-permission association table tracks by userid instead, but it's a LOT more human friendly to go off of username. Regarding the loan schema...yes, I was referring to it more in regard to the id/object thing as opposed to the referential thing. Here's my compositeUser schema with IDs and objects. https://github.com/folio-org/raml/blob/add-mod-user-bl/schemas/mod-users-bl/compositeUser.json |
| Comment by Mike Taylor [ 02/Jun/17 ] |
|
Joining on something other than the PK make me itchy. |
| Comment by Kurt Nordstrom [ 02/Jun/17 ] |
|
In the case of the permissionUsers table, the username is the primary key. https://github.com/folio-org/raml/blob/master/schemas/mod-permissions/permissionUser.json |
| Comment by Mike Taylor [ 02/Jun/17 ] |
|
How can that make sense when it's not the PK of the table that permissionsUsers joins? |
| Comment by Kurt Nordstrom [ 02/Jun/17 ] |
|
It's a unique field...you won't have two users with the same username on a given tenant. It was implemented before the decision was made to use opaque userids for the user object. That said, I do prefer it, but I suppose we could retool the permissions association to go by userid as opposed to username. Would this screw up much of the workings of UI components, though? |
| Comment by Mike Taylor [ 02/Jun/17 ] |
|
The UI's use of permissions associations would hardly even notice if this change was made: all it does is fetch http://localhost:9130/perms/users/auth_test1/permissions?full=true and it would be easy enough to use (say) http://localhost:9130/perms/users/77136CF1-FF93-4C45-B7B5-187BCB711745/permissions?full=true instead. (Not that I am seriously suggesting we change it at this point; but clearly the way it works now is not in accordance with our general guidelines.) |
| Comment by shale99 [ 04/Jun/17 ] |
|
i would like to summarize current status: 1. when a schema needs to contains the option of either an id or a de-referenced object (injected object of another schema into a schema). The schema will include 2 entries: Using the RMB cross-module join you can easily auto-populate the de-referenced field in the schema - by joining id's received from the response json with the fieldNameId field value (it is a one liner to populate). consider moving to separate lib so that we can centralize this type of functionality and add features / maintain one tool The above outlines read functionality for smaller result sets. Medium to large sets will break this and will require a more robust solution, for example, an indexing layer (solr / elastic). Question: if we flag a module as needing to provide medium / large result set joins - hence we need to index (do something else to) its data - we should consider using a single method for all types of joins for that module. |
| Comment by Kurt Nordstrom [ 05/Jun/17 ] |
|
shale99 Good summary! Thanks! My preference for sharing schemas would be to use a central repo. I am not dogmatically opposed to doing it differently, if somebody can outline how it will work. |
| Comment by Mike Taylor [ 05/Jun/17 ] |
|
> WHAT IS THE DECISION HERE? submodules from central raml repo, submodule per module - please vote. No preference. (I guess because, as a front-end guy, I am only really interested in the API.) |
| Comment by shale99 [ 06/Jun/17 ] |
|
one more minor issue up for a vote the include query param raml does not support comma seperated params as multi values - but rather as a single value with commas - what that means is the implementer will need to parse this there is support however for multi valued query params in the following manner include=field_a&include=field_b..... this will also pass a list of strings (field names) - to implementing functions which is more convenient. seems like there is no html standard for this - but i did see people lean towards multiple fields (not commas) , although this does lengthen the url i have implemented the raml trait in this way but i dont find this critical, i guess this is more to the ui guys, what is more convenient for you? |
| Comment by Mike Taylor [ 06/Jun/17 ] |
|
As one who will be dealing only with the API and not with the implementation, I strongly prefer adhering to the syntax specified in the JSON-API guidelines. |
| Comment by Marc Johnson [ 06/Jun/17 ] |
|
My understanding from last weeks meeting was that we left ourselves with two ownership options:
And two sharing options (although it sounds like we've decided on using sub-modules for the moment):
I'm only referring to schema, however interfaces might take a similar form. It might be worth branching the conversation about sharing schema onto
|
| Comment by Jakub Skoczen [ 06/Jun/17 ] |
|
Please put the conclusion in a markdown file e.g "Aggregation Guidelines" in the central repository as indicated by David Crossley |
| Comment by David Crossley [ 07/Jun/17 ] |
|
Perhaps at dev.folio.org/doc/business-modules-aggregation Would someone please summarise the "conclusion". Then i could create the doc, expand it into full text, etc. |
| Comment by shale99 [ 13/Jun/17 ] |
|
summarize recent call: 1. include list passed in by the client will indicate to the service what data to retrieve for example: composite schema will not contain both
"patronGroupId": {
"type": "string"
},
"patronGroup": {
"type": "object",
"$ref": "../mod-users/usergroup.json"
},
but rather only
"patronGroup": {
"type": "object",
"$ref": "../mod-users/usergroup.json"
}
if the include requested by the client does not request the patron group , this objecct will not be included in the response. note that the patron group id may exist in the user object which may be requested and included in the composite response comments? |
| Comment by Marc Johnson [ 15/Jun/17 ] |
|
shale99 I think I might need help in my understanding. Taking the user business logic example, would this be a sensible representation of the user at this level, based upon what was agreed about composite schema (assuming that the request states including the group information): { , } |
| Comment by shale99 [ 15/Jun/17 ] |
|
yes |
| Comment by Jakub Skoczen [ 15/Jun/17 ] |
|
I think this looks sensible. What is the behavior when the client did not requested a particular object to be requested? Would we return e.g patronGroup: "null" or omit it the entry from the response? Since we state in the schema that the entry is optional, maybe we should omit it entirely – returning null may give impression that the object is indeed null. |
| Comment by shale99 [ 15/Jun/17 ] |
|
omitted in my opinion |
| Comment by Marc Johnson [ 15/Jun/17 ] |
|
My preference would be to omit it entirely. From the experiments I've tried with schema, if we decide to validate the response, the patronGroup would be expected to be an object and I don't think null validates again that definition (unless we say that patronGroup can be an object or null). |
| Comment by Mike Taylor [ 15/Jun/17 ] |
|
Agreed; omit. |
| Comment by Marc Johnson [ 27/Jun/17 ] |
|
As part of this change, we seem to be changing to using nested properties for the primary record. Where we used to have the user or item as the top level JSON object, we now represent them as compositions, where one of the properties is for this primary record e.g.
{
"user": {
"id": ...,
"username": ...,
...
}
}
instead of:
{
"id": ...,
"username": ...,
...
}
If we have business logic interfaces that don't currently have any related records and so no inclusions, do we still want to nest the primary record (as this makes it more consistent and allow for extension to these related records later without necessarily breaking compatibility)? |
| Comment by Marc Johnson [ 07/Jul/17 ] |
|
As I work on
1. Up until now, the collection resource type has used the same resource representation for read and write (GET, POST and PUT) meaning that a client can GET a representation and modify it to make write requests if it chooses to. Following the discussions about aggregation, we are defining composite representations (e.g. https://github.com/folio-org/raml/blob/82f83f5dab40f60de9aa664b0940df41b77a3e76/schemas/mod-users-bl/compositeUser.json) for read (GET) responses. Do we want to preserve the symmetry between read and write requests or use the none-composite representation of the primary record for write operations (POST/PUT)? 2. Given we intend to share the schema between storage and business logic modules, if a business logic module needs to extend the representation, should it do it by adding properties to the composite schema (meaning that it doesn’t only do aggregation of related record representations)? |
| Comment by Jakub Skoczen [ 25/Jul/17 ] |
|
Let's put a page under the website GH repo about aggregation. |