[MODUIMP-4] Bulk import of users needs performance improvements Created: 20/Jun/18 Updated: 19/Jan/21 |
|
| Status: | Open |
| Project: | mod-user-import |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Tech Debt | Priority: | P3 |
| Reporter: | Tod Olson | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | back-end, migration-load, qulto, sprint46, sprint47, sprint48 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||||||||||||||
| Description |
|
We need performant bulk import of users for both migration and ongoing operations. The current user import module submits users one-by-one, which is rather slow at a scale of, say, 90K users, especially when trying to test data loading at scale. SysOps foresees similar issues with loading data into other modules. There are two general scenarios:
The migration scenario here seems like a special case of the bulk import to support operations. There will be a need for bulk imports across FOLIO modules for both migration and operations. Users seems like a good starting point to consider some common strategies or techniques for bulk import that could inform best practices for project as a whole. This is related to
|
| Comments |
| Comment by Cate Boerema (Inactive) [ 21/Jun/18 ] |
|
Jakub Skoczen, István Bender, thoughts on how we can improve performance here? Is it time to add bulk loading functionality per
|
| Comment by István Bender [ 25/Jun/18 ] |
|
I can profile which part takes long time but my gut feeling is that calling the mod-users API twice per user (create/update user and adding empty permission set) must be the bottleneck. Bulk user creation would definitely improve the performance of user import. |
| Comment by István Bender [ 28/Aug/18 ] |
|
Tod Olson could you send me a sample user import JSON file which contains 90k users to import. I will do some profiling and performance tests. |
| Comment by Jon Miller [ 29/Aug/18 ] |
|
I generated a sample file of 100,000 users and attached it to this issue. The file just contains generated data and a subset of the JSON attributes. I'm thinking this will get you going. I can create a more realistic file if needed. |
| Comment by István Bender [ 30/Aug/18 ] |
|
Jon Miller I generated a new one (using http://www.mockaroo.com/) because some attributes (patronGroup, addresses, type, dateOfBirth) were missing from your sample file. I cannot upload it here because the compressed size is 15M and Jira allows only max 10M file size to upload. An example user in my samle JSON:
{
"active": true,
"barcode": "19-971-1678",
"externalSystemId": "8294b291-f227-4cb8-8635-762fadc8ce7f",
"patronGroup": "graduate",
"personal": {
"addresses": [
{
"addressLine1": "74 Fuller Point",
"addressTypeId": "Payment",
"city": "San Francisco",
"postalCode": "94132",
"primaryAddress": true,
"region": "California"
}
],
"dateOfBirth": "1968-04-23",
"email": "aguppie0@vinaora.com",
"firstName": "Aubrey",
"lastName": "Guppie",
"mobilePhone": "235(862)784-4412",
"phone": "48(210)504-6148",
"preferredContactTypeId": "mail"
},
"type": "patron",
"username": "969402e4-cf2d-4255-be18-8673c58939f3"
}
Feel free to use my schema: https://www.mockaroo.com/c238b770 |
| Comment by Jon Miller [ 30/Aug/18 ] |
|
István Bender Thanks |
| Comment by Cate Boerema (Inactive) [ 05/Sep/18 ] |
|
Hi István Bender. Any test results yet? |
| Comment by István Bender [ 05/Sep/18 ] |
|
Zoltan Erdos will do it in the current sprint. He will also give an estimate how many effort to integrate mod-user-import in mod-users. I hope we will be able to answer these questions at the end of this week. |
| Comment by Zoltan Erdos [ 11/Sep/18 ] |
|
Test environment. 16GB DDR4 RAM, i7-7820HQ CPU, 240GB SSD base vagrant box : folio/testing Used ram for these 4 module (run from intelliJ IDEA) was 2.5GB(max value) when I imported 100.000 user at once. The json file with the users was 92MB. Importing time test results: 5k insert - 1.0 min;0.78 minutes; 0.85 min, 0.72 min The 40% of the import time is spent in mod users to get users by external id-s. It is important to add temporarily more RAM for mod-user-import, mod-user, mod-permission if they want to import a lot of user (more than 10k). The slowest parts of the user import are: Recommendation for mod-users developers: Recommendation for mod-permissions developers: |
| Comment by István Bender [ 28/Sep/18 ] |
|
We performed new bulk import test in the following environment: 16GB DDR4 RAM, i7-7820HQ CPU, 240GB SSD
All modules are running in Vagrant box none of them hosted locally. Experiences:
We didn't have time to dig deeper and identify the root cause of poor performance but something may changed in a wrong way since our last benchmark. I cannot state that the cause of slow-down is mod-users or mod-permission modules. Perhaps there is something completely different reason. Jakub Skoczen What should we do now? What do you suggest considering we don't have too much time until our contract termination. |
| Comment by patty.wanninger [ 24/Feb/20 ] |
|
Cate Boerema, Ian and I are reviewing this ticket - would this be superceded by Modusers-3? Still looking for a benchamrk of 70 records per second. |