[FOLIO-664] Get feedback on Bulk User Import prerequisites Created: 12/Jun/17 Updated: 12/Nov/18 Resolved: 10/Jul/17 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | New Feature | Priority: | P2 |
| Reporter: | Nagy István | Assignee: | Katalin Lovagné Szűcs |
| Resolution: | Done | Votes: | 0 |
| Labels: | for-next-sprint, sprint16, sprint17, team1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | 1 day | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||||||
| Description |
|
Bulk user import will be realized using institution made scripts which are doing institution specific data retrieval, transformation and calling the FOLIO API to update the users database. During the UM-SIG meetings it was identified that FOLIO's mod-users have existing API endpoints which can support the bulk user import functionality. But there are some mismatches which need to be solved:
I see two ways to solve this:
|
| Comments |
| Comment by Kurt Nordstrom [ 14/Jun/17 ] |
|
How much special infrastructure do we need to support bulk user imports? Let's assume we can send a single POST request to add a given user to the system (likely this will need to be exposed by the mod-users-bl module, since a practical import is going to have to include permissions and login credentials information). It does not seem terribly prohibitive to me to need to make a single call per user that we're importing. Sure, the overhead is higher than some process that takes a single file containing multiple users, but it's the sort of thing that only needs to happen once in the lifetime if a given system. For dealing with pre-existing users, why not just pay attention to the failure response in the case of a failure. If you try to POST a new users and get a 422 (or maybe a 409?) then that tells the script that the user already exists. If need be, we can send a query request after the failed POST in order to verify this. Not terribly hard from a scripting standpoint. |
| Comment by Jakub Skoczen [ 15/Jun/17 ] |
|
Yes, a single POST with user metadata (including related meta-data that is currently kept in seperate enpoints, like potentially contact) seems good enough for this. I think the try-catch approach that Kurt proposes sounds good. We can also extend the webservice with and "addressable" PUT – if the ID for PUT is provided and user exists the user will get updated, if not it will be created with a given ID (I think some enpoints should already work like that). What's your thinking Kurt? |
| Comment by Nagy István [ 15/Jun/17 ] |
|
Signaling collision with HTTP error codes is good enough, and shouldn't be hard to handle. I'm thinking that maybe we should supply some kind of example script skeletons anyway to demonstrate how an import tool will behave and to show use cases like updating an existing user. I have some basic experience with RAML. I guess it should be possible to merge the accepted JSON object structure of the import endpoint from different sub type definitions (like reusing the already defined address block). So you can minimize the issue of the data structures drifting away from each other by multiple API endpoints. I see one (actually not so small) issue with the "provide-every-user-data-in-one-object" structure. |
| Comment by VBar [ 16/Jun/17 ] |
|
I agree that the user creation and user updating should be separate: the former a POST and the latter a PUT. The proposal would be:
But if step 2 returns a 4xx error code it won't be able to return the ID of the identified user. So the ID would not be available in step 3 to do an incremental update (if it becomes supported). The result would be to completely overwrite the existing user record. Therefore the issue of not having an ID starts before associated collections and already exists with the user record. |
| Comment by Cate Boerema (Inactive) [ 19/Jun/17 ] |
In this scenario, I'd think we can just assume that the address list would be reduced to just one address (your option b). This seems like the simplest, workable option. |
| Comment by Nagy István [ 19/Jun/17 ] |
|
Even if we consider the b) option, how will the system know if the single address you just provided is By associations I mean that address might be eg. used in an existing loan as a retrieval address and we just cannot delete it. But we cannot update it either, because we don't have any specific ID like "2nd address of the 1st user" to refer to it explicitly. |
| Comment by Cate Boerema (Inactive) [ 19/Jun/17 ] |
|
Oh, I see. I don't actually know of any scenarios in which user addresses have "associations" in other objects. Retrieval address doesn't make sense to me, as you wouldn't retrieve/pick up a loan at a user address (you'd pick up at a library location). Addresses may be tied to notifications but my understanding is that email is, by far, the preferred method of communication with users. If snail mail notifications are sent, we might just want to archive the notification itself or key details such as when and to where it was sent. That way the address could be later removed without impacting the notification audit trail. We probably should check with the SIG whether there are other associations for user addresses. |
| Comment by Katalin Lovagné Szűcs [ 21/Jun/17 ] |
|
We had a discussion about address updates on today's SIG meeting. The conclusion is that we should only update the addresses originally came from the external system. A user can have one address per address type which can be the identifier between the FOLIO data and the external system's data. Manually created addresses in FOLIO should not be updated from the external source. When an address is deleted in the external system, an empty or flagged (deleted) nested address object could be sent to FOLIO. While discussing the user update, a new idea came up: maybe we should ask the system first about the users to insert/update if they already exist in the system (e.g. by their external system id) and if so retrieve the id of the user so that an update action can be sent instead of a failing insert. Can we use the existing user search API for this purpose? |
| Comment by Kurt Nordstrom [ 21/Jun/17 ] |
|
To the extent of my knowledge, we are not currently tracking addresses as actual entities. Rather, they are simply sub-fields within the user record. Is there a use case that would require that addresses be treated as discrete objects and link to the user record by identifier? Regarding querying for IDs, it is certainly possible to query the user and get the ID if they exist. However, I think that if we do this for all users, we end up making a lot more requests than if we simply attempt to create and then take action after a failed create attempt. |
| Comment by Katalin Lovagné Szűcs [ 22/Jun/17 ] |
|
Yes, we discussed the problem of changing and deleting address blocks originated from the external system. These should be distinguished from the ones that were manually added in FOLIO. That is why we thought about address types for an "identifier" to know which addresses should be updated, removed or left as is. Does that make sense to you? We talked about an option to check all user data in one query (not one per user). It came up because some systems cannot handle deltas but resend the whole user set every time and this would cause a lot of insert call to fail and fallback to update. |
| Comment by Katalin Lovagné Szűcs [ 23/Jun/17 ] |
|
When importing users how the credentials should be treated? Do we want to import user passwords via the bulk import? Should we generate default passwords instead of importing them? What about SSO? How should we treat SSO user identifiers when importing user data? |
| Comment by Kurt Nordstrom [ 23/Jun/17 ] |
|
Katalin Lovagné Szűcs: I think we have to plan for creating an endpoint in the Users Business Logic module that takes a composite record, as opposed to just one particular record. So something could be posted like:
{
"user" : { "username" : ..., "id" : ..., "address" : ... },
"permissions: { "permissions" : [ ... ] },
"credentials" : { "username" : ..., "password" : ... },
"saml_credentials" : { "username" : ..., "external_id" : ... }
}
And the module would take care of contacting all responsible submodules and populating them with appropriate data (if it existed). |
| Comment by Cate Boerema (Inactive) [ 26/Jun/17 ] |
|
Katalin Lovagné Szűcs, were you going to update this issue with the notes from our discussion with Istvan and Kurt? Also, we are now starting Sprint 17. It would be great if we could close this (assuming it is complete). Thanks! |
| Comment by Katalin Lovagné Szűcs [ 26/Jun/17 ] |
|
Conclusions of Friday's meeting:
|
| Comment by Cate Boerema (Inactive) [ 26/Jun/17 ] |
|
Some other key points:
|
| Comment by Nagy István [ 26/Jun/17 ] |
|
I think the requirements are very close to the final list. In my opinion the only things left here to decide is if the bulk import will have it's very own import endpoint or the existing mod-users (or mod-users-bl) insert/update endpoint can be extended/modified to fit these use case scenarios, as well as handling UI operations. If Kurt Nordstrom's idea (to use existing insert/update user endpoint) is supported by others, I think we can
|
| Comment by Nagy István [ 26/Jun/17 ] |
|
I've reopened this since it's made it's way to sprint17, so it's still relevant. Also see my previous comment. |
| Comment by Katalin Lovagné Szűcs [ 10/Jul/17 ] |
|
An import script was created. It uses the currently existing endpoints. |