[FOLIO-664] Get feedback on Bulk User Import prerequisites Created: 12/Jun/17  Updated: 12/Nov/18  Resolved: 10/Jul/17

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: New Feature Priority: P2
Reporter: Nagy István Assignee: Katalin Lovagné Szűcs
Resolution: Done Votes: 0
Labels: for-next-sprint, sprint16, sprint17, team1
Remaining Estimate: Not Specified
Time Spent: 1 day
Original estimate: Not Specified

Issue links:
Blocks
blocks MODUSERS-22 Implement first pass user import Closed
is blocked by MODUSERBL-11 Do Work Needed to Support User Import Closed
is blocked by MODUSERS-24 Check if user query functionality sui... Closed
is blocked by FOLIO-698 document RESTful API guidelines for m... Draft
Relates
relates to FOLIO-660 Discussion: Controlling Login to FOLIO Closed
Sprint:

 Description   

Bulk user import will be realized using institution made scripts which are doing institution specific data retrieval, transformation and calling the FOLIO API to update the users database. During the UM-SIG meetings it was identified that FOLIO's mod-users have existing API endpoints which can support the bulk user import functionality. But there are some mismatches which need to be solved:

  • User import action needs to be an atomic / transactional operation and modifications needs to be rolled back if an issue occurs. The user insert endpoint is only inserting base user data, additional structures like address entities have to be added in a separate call. If an address insertion fails, the script have to manually rollback changes which is cumbersome to do.
  • The insert API call (correctly) fails, if the user already exists in the system. However in the script there should be a way to do an "update-if-exists" operation if some fields of an existing user changes in the source system. Currently this can only be done, by doing a separate query to check if the user already exists and instead of the POST endpoint, the PUT endpoint should be called. This again adds complexity to the script.

I see two ways to solve this:

  • Create an additional “insert-or-update-with-all-user-data” API call which atomically inserts or updates a user with each and every field. This solve the issue, however it's against the REST principles.
  • Leave the existing REST calls intact (creating no duplicates) but add transaction opening/committing/rollback capability to the existing REST API. As an example how this can be done, see Fedora's REST API Transaction support: https://wiki.duraspace.org/display/FEDORA40/RESTful+HTTP+API+-+Transactions This might be also usable with other scenarios of the FOLIO REST API calls, not just mod-users, but it's definitely a more complex thing to implement.


 Comments   
Comment by Kurt Nordstrom [ 14/Jun/17 ]

How much special infrastructure do we need to support bulk user imports?

Let's assume we can send a single POST request to add a given user to the system (likely this will need to be exposed by the mod-users-bl module, since a practical import is going to have to include permissions and login credentials information). It does not seem terribly prohibitive to me to need to make a single call per user that we're importing. Sure, the overhead is higher than some process that takes a single file containing multiple users, but it's the sort of thing that only needs to happen once in the lifetime if a given system.

For dealing with pre-existing users, why not just pay attention to the failure response in the case of a failure. If you try to POST a new users and get a 422 (or maybe a 409?) then that tells the script that the user already exists. If need be, we can send a query request after the failed POST in order to verify this. Not terribly hard from a scripting standpoint.

Comment by Jakub Skoczen [ 15/Jun/17 ]

Yes, a single POST with user metadata (including related meta-data that is currently kept in seperate enpoints, like potentially contact) seems good enough for this.

I think the try-catch approach that Kurt proposes sounds good. We can also extend the webservice with and "addressable" PUT – if the ID for PUT is provided and user exists the user will get updated, if not it will be created with a given ID (I think some enpoints should already work like that). What's your thinking Kurt?

Comment by Nagy István [ 15/Jun/17 ]

Signaling collision with HTTP error codes is good enough, and shouldn't be hard to handle. I'm thinking that maybe we should supply some kind of example script skeletons anyway to demonstrate how an import tool will behave and to show use cases like updating an existing user.

I have some basic experience with RAML. I guess it should be possible to merge the accepted JSON object structure of the import endpoint from different sub type definitions (like reusing the already defined address block). So you can minimize the issue of the data structures drifting away from each other by multiple API endpoints.

I see one (actually not so small) issue with the "provide-every-user-data-in-one-object" structure.
Let's say you are about to update a user which had two addresses previously, and you supply in the update just a single address. As far as i remember, address blocks' don't have separate ID's, so you cannot explicitly specify which address you are referring to (but if it does, there might be other list fields which doesn't and the issue will be the same). How will FOLIO know if you meant to a) update one of the addresses b) reduce the address list to just one address c) it's a new address you are about to add?
Updating associated collections seems difficult. We can say, that if you would like to modify a collection field, you have to supply all elements and they will be replaced, but that still has issues with references. Existing objects might already referencing an older address entry and this link have to be kept.

Comment by VBar [ 16/Jun/17 ]

I agree that the user creation and user updating should be separate: the former a POST and the latter a PUT. The proposal would be:

  1. Attempt a POST on the user
  2. Check response code and see if user already exists
  3. If user already exists then attempt a PUT operation instead.

But if step 2 returns a 4xx error code it won't be able to return the ID of the identified user. So the ID would not be available in step 3 to do an incremental update (if it becomes supported). The result would be to completely overwrite the existing user record. Therefore the issue of not having an ID starts before associated collections and already exists with the user record.

Comment by Cate Boerema (Inactive) [ 19/Jun/17 ]

Let's say you are about to update a user which had two addresses previously, and you supply in the update just a single address. As far as i remember, address blocks' don't have separate ID's, so you cannot explicitly specify which address you are referring to (but if it does, there might be other list fields which doesn't and the issue will be the same). How will FOLIO know if you meant to a) update one of the addresses b) reduce the address list to just one address c) it's a new address you are about to add?

In this scenario, I'd think we can just assume that the address list would be reduced to just one address (your option b). This seems like the simplest, workable option.

Comment by Nagy István [ 19/Jun/17 ]

Even if we consider the b) option, how will the system know if the single address you just provided is
1) a 4th new address and the previous 3 should be deleted
2) it's an updated variant of one of the previous three, it's associations needs to be preserved and the other two should be deleted?

By associations I mean that address might be eg. used in an existing loan as a retrieval address and we just cannot delete it. But we cannot update it either, because we don't have any specific ID like "2nd address of the 1st user" to refer to it explicitly.

Comment by Cate Boerema (Inactive) [ 19/Jun/17 ]

Oh, I see. I don't actually know of any scenarios in which user addresses have "associations" in other objects. Retrieval address doesn't make sense to me, as you wouldn't retrieve/pick up a loan at a user address (you'd pick up at a library location). Addresses may be tied to notifications but my understanding is that email is, by far, the preferred method of communication with users. If snail mail notifications are sent, we might just want to archive the notification itself or key details such as when and to where it was sent. That way the address could be later removed without impacting the notification audit trail.

We probably should check with the SIG whether there are other associations for user addresses.

Comment by Katalin Lovagné Szűcs [ 21/Jun/17 ]

We had a discussion about address updates on today's SIG meeting.

The conclusion is that we should only update the addresses originally came from the external system. A user can have one address per address type which can be the identifier between the FOLIO data and the external system's data. Manually created addresses in FOLIO should not be updated from the external source.

When an address is deleted in the external system, an empty or flagged (deleted) nested address object could be sent to FOLIO.

While discussing the user update, a new idea came up: maybe we should ask the system first about the users to insert/update if they already exist in the system (e.g. by their external system id) and if so retrieve the id of the user so that an update action can be sent instead of a failing insert. Can we use the existing user search API for this purpose?

Comment by Kurt Nordstrom [ 21/Jun/17 ]

To the extent of my knowledge, we are not currently tracking addresses as actual entities. Rather, they are simply sub-fields within the user record. Is there a use case that would require that addresses be treated as discrete objects and link to the user record by identifier?

Regarding querying for IDs, it is certainly possible to query the user and get the ID if they exist. However, I think that if we do this for all users, we end up making a lot more requests than if we simply attempt to create and then take action after a failed create attempt.

Comment by Katalin Lovagné Szűcs [ 22/Jun/17 ]

Yes, we discussed the problem of changing and deleting address blocks originated from the external system. These should be distinguished from the ones that were manually added in FOLIO. That is why we thought about address types for an "identifier" to know which addresses should be updated, removed or left as is. Does that make sense to you?

We talked about an option to check all user data in one query (not one per user). It came up because some systems cannot handle deltas but resend the whole user set every time and this would cause a lot of insert call to fail and fallback to update.

Comment by Katalin Lovagné Szűcs [ 23/Jun/17 ]

When importing users how the credentials should be treated? Do we want to import user passwords via the bulk import? Should we generate default passwords instead of importing them?

What about SSO? How should we treat SSO user identifiers when importing user data?

Comment by Kurt Nordstrom [ 23/Jun/17 ]

Katalin Lovagné Szűcs: I think we have to plan for creating an endpoint in the Users Business Logic module that takes a composite record, as opposed to just one particular record. So something could be posted like:

{
 "user" : { "username" : ..., "id" : ..., "address" : ... },
 "permissions: { "permissions" : [ ... ] },
 "credentials" : { "username" : ..., "password" : ... },
 "saml_credentials" : { "username" : ..., "external_id" : ... }
}

And the module would take care of contacting all responsible submodules and populating them with appropriate data (if it existed).

Comment by Cate Boerema (Inactive) [ 26/Jun/17 ]

Katalin Lovagné Szűcs, were you going to update this issue with the notes from our discussion with Istvan and Kurt? Also, we are now starting Sprint 17. It would be great if we could close this (assuming it is complete). Thanks!

Comment by Katalin Lovagné Szűcs [ 26/Jun/17 ]

Conclusions of Friday's meeting:

  • Addresses will be identified by address type for a specific user when updating user data.
  • We should check if users in import should be inserted or updated with a search endpoint (with limited number of users in a call).
  • SSO user identifiers have to be stored in the separate module for SSO authentication.
  • We should generate random passwords when importing user data the first time so that we do not have to deal with importing the passwords of the users.
Comment by Cate Boerema (Inactive) [ 26/Jun/17 ]

Some other key points:

  • SIG has now decided that there will only be one address per type (which is what allows us to use the type as the match point)
  • Address types are library-defined ( UIU-79 Closed may go in this sprint (sprint 17))
  • Kurt believes that addresses can be updated in the same call/via same endpoint as the user import update (@jakub, this differs from what you were previously thinking, I believe - we wanted to make sure you had the chance to weigh in)
  • We discussed the SIGs suggestion that "manually created addresses in FOLIO should not be updated from the external source." I think we concluded that we wouldn't explicitly prevent this from happening, but we also wouldn't expect there to be many issues here. Suppose there are 4 address types defined for library A. The import may push in type 1 and 3 for John Doe. An administrator may add type 2 and 4 via FOLIO UI. We wouldn't expect the FOLIO addresses to be overwritten unless the import suddenly pushed data (or a null value) for types 2 and 4.
Comment by Nagy István [ 26/Jun/17 ]

I think the requirements are very close to the final list.

In my opinion the only things left here to decide is if the bulk import will have it's very own import endpoint or the existing mod-users (or mod-users-bl) insert/update endpoint can be extended/modified to fit these use case scenarios, as well as handling UI operations.

If Kurt Nordstrom's idea (to use existing insert/update user endpoint) is supported by others, I think we can

  • add tasks necessarily to retrofit the existing endpoint to support import requirements (if there's any actual need to modify them)
  • add task to create a demo script to serve as a starting point for the SIG in order to create their own institution specific versions (as a side note, Qulto is planning to try this out with it's own ILS system as an additional validation step)
Comment by Nagy István [ 26/Jun/17 ]

I've reopened this since it's made it's way to sprint17, so it's still relevant. Also see my previous comment.

Comment by Katalin Lovagné Szűcs [ 10/Jul/17 ]

An import script was created. It uses the currently existing endpoints.

Generated at Thu Feb 08 23:07:27 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.