Bulk import of users needs performance improvements

Description

We need performant bulk import of users for both migration and ongoing operations.

The current user import module submits users one-by-one, which is rather slow at a scale of, say, 90K users, especially when trying to test data loading at scale. SysOps foresees similar issues with loading data into other modules.

There are two general scenarios:

  1. At migration time, sites will need to do an initial bulk import from their existing systems, and new UUIDs will somehow need to be minted as part of the migration process.

  2. In ongoing operations, sites will do regular bulk updates from their campus identity management (IdM) infrastructure to refresh their campus users. In this case, existing UUIDs and some internal FOLIO fields will need to be preserved.

The migration scenario here seems like a special case of the bulk import to support operations.

There will be a need for bulk imports across FOLIO modules for both migration and operations. Users seems like a good starting point to consider some common strategies or techniques for bulk import that could inform best practices for project as a whole.

This is related to .

Environment

None

Potential Workaround

None

Attachments

2

Checklist

hide

TestRail: Results

Activity

Show:

patty.wanninger February 24, 2020 at 8:16 PM

, Ian and I are reviewing this ticket - would this be superceded by Modusers-3?

Still looking for a benchamrk of 70 records per second.

István Bender September 28, 2018 at 1:18 PM

We performed new bulk import test in the following environment:

16GB DDR4 RAM, i7-7820HQ CPU, 240GB SSD

  • base vagrant box : folio/testing version: 5.0.0-20180925.1100

  • mod_users, version: 15.3.0

  • mod_user_import, version: 3.1.1

  • mod_permissions, version: 5.4.0

All modules are running in Vagrant box none of them hosted locally.

Experiences:

  • Vagrant box and services are starting much slower than usual

  • The performance of user import is much lower than it was during our last measure. 1000 user's import took more then 5 minutes!

  • We experienced a lot of postgres process running at the same time in the box consuming huge CPU (see attached image)

We didn't have time to dig deeper and identify the root cause of poor performance but something may changed in a wrong way since our last benchmark. I cannot state that the cause of slow-down is mod-users or mod-permission modules. Perhaps there is something completely different reason.

What should we do now? What do you suggest considering we don't have too much time until our contract termination.

Zoltan Erdos September 11, 2018 at 9:24 AM

Test environment.

16GB DDR4 RAM, i7-7820HQ CPU, 240GB SSD

base vagrant box : folio/testing
Replaced/Redeployed modules, from locally source code with embedded postgres:
mod-users, mod-permissions, mod-user-import, mod-users-bl

Used ram for these 4 module (run from intelliJ IDEA) was 2.5GB(max value) when I imported 100.000 user at once. The json file with the users was 92MB.

Importing time test results:

5k insert - 1.0 min;0.78 minutes; 0.85 min, 0.72 min
5k update - 0.5 min; 0.47 min; 0.46 min, 0.47 min, 0.5 min
100k insert: 24.5 min, 25min

The 40% of the import time is spent in mod users to get users by external id-s.
The 20% is for the POST or PUT the users into mod-users.
The rest 40% is for POST permissions info into mod-permission module.

It is important to add temporarily more RAM for mod-user-import, mod-user, mod-permission if they want to import a lot of user (more than 10k).

The slowest parts of the user import are:
Querying mod-users by external-id (bulk GET with 10-30 user at once).
Insert permissions to mod-permissions.

Recommendation for mod-users developers:
The user import would be faster, if there will be an index on the users table on the externalSystemId field.

Recommendation for mod-permissions developers:
Review postPermsUsers implementation is necessary, because POST permessionUser it takes a little longer than we expected. (10 times longer(3-5ms vs 30-50ms)) than post new user into mod-users)

István Bender September 5, 2018 at 9:10 AM

will do it in the current sprint. He will also give an estimate how many effort to integrate mod-user-import in mod-users. I hope we will be able to answer these questions at the end of this week.

Cate Boerema September 5, 2018 at 9:01 AM

Hi . Any test results yet?

Details

Assignee

Reporter

Priority

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created June 20, 2018 at 4:14 PM
Updated January 19, 2021 at 8:52 PM
TestRail: Cases
TestRail: Runs