Overview

In the scope of PERF-386 - Getting issue details... STATUS it's needed to run tests to answer questions:

Determine time it takes to complete import
Determine main modules that are involved in the process (if obvious or if known)
Test specific settings or items or scenarios: Check-in and Checkout (CICO) is in progress and there are 5 concurrent users. Test concurrent DI jobs with multi-tenants in the same cluster.

Summary

Time to successfully complete 1k records data import available for tenant ptf-ncp5-00 and is approximately 15 sec, 5k records - 1 min, 10k records - 2 min, 22.7k records - 4 min 30 sec, and 50k records data import is approximately 9 min 37 sec.
Main modules that are involved in the process:
1. mod-quick-marc
2. mod-source-record-storage
3. mod-inventory
4. mod-source-record-manager
5. mod-data-import
6. mod-di-converter-storage
7. mod-search
8. nginx-okapi
9. mod-inventory-storage
10. okapi
11. mod-entities-links
DI with CI/CO - no degradation for data import time but degradation for Check-in and Checkout time is up to 3 times during Data import. Multitenant testing of concurrent jobs from different tenants and consecutive jobs from ptf-ncp5-01, and ptf-ncp5-02 tenants both were completed with errors where all records were discarded, sometimes one and only run for each DI file could be successful. Jobs for 2 or 3 tenants simultaneously were tested but never finished due to an error MODSOURCE-581 - Getting issue details... STATUS * occurred. As jobs were stopped by the user due to an error (about 10-15% done for 5 hours), the results are irrelevant.
Memory utilization grows for 3 modules: mod-source-record-manager:3.6.2 from 83% to 105%, mod-source-record-storage:5.6.5 from 86% to 89%, mod-inventory-storage:26.0.0 from 42% to 54%. Jira ticket is opened PERF-541 - Getting issue details... STATUS . All other modules behave stable during Data Import.
17/05/2023 in accordance with description of PERF-541 the series of tests were performed. The growth of memory for mod-source-record-manager was not significant and stabilized after some time. The heap dump analysis was performed for all modules and it didn't reveal memory leaks.
Most CPU-consuming modules: mod-quick-marc - 79%, mod-source-record-storage - 74%, mod-inventory - 69%, mod-source-record-manager - 67%, others - usage less than 30%.

* MODSOURCE-581 - SPIKE: Multiple tenant DI testing - import jobs are hanging CLOSED is reproducible for Orchid release with modules configuration mod-source-record-storage: cpu:1024 memory:4096/3688, DB_MAXPOOLSIZE=30, DB_CONNECTION_TIMEOUT=40
mod-source-record-manager: cpu:1024 memory:4096/3688, DB_MAXPOOLSIZE=30. And planned to be retested with an increased size of the database PERF-544 - Getting issue details... STATUS , and with all needed Trigger functions too PERF-547 - Getting issue details... STATUS .

Recommendations & Jiras (Optional)

Jiras

MODSOURMAN-982 - Getting issue details... STATUS Do not process chunks when the DI job is completed

PERF-541 - Getting issue details... STATUS Investigate potential memory leak for DI modules

MODDATAIMP-809 - Getting issue details... STATUS Investigate why records are discarded for jobs completed with errors.

Test Runs & Results

Job Profile "KG Create authority" - https://bugfest-nolana.int.aws.folio.org/settings/data-import/job-profiles/view/d3271c74-97ec-4dd9-9470-97b2154d63fd?query=KG&sort=name

Baseline test

Test #	# of records	% creates	File	Time it takes to complete import
1	1,000	100	https://folio-org.atlassian.net/wiki/download/attachments/1385982/1k_marc_authority.mrc?api=v2	14 sec
2	5,000	100	https://folio-org.atlassian.net/wiki/download/attachments/1385982/LC_SUBJ_msplit00000000.mrc?api=v2	55 sec
3	10,000	100	https://folio-org.atlassian.net/wiki/download/attachments/1385982/msplit00000000.mrc?api=v2	1 min 59 sec
4	22778	100	https://folio-org.atlassian.net/wiki/download/attachments/1385982/msplit00000013.mrc?api=v2	4 min 31 sec
5	50,000	100	https://folio-org.atlassian.net/wiki/download/attachments/1385982/50000_authorityrecords.mrc?api=v2	9 min 48 sec

Test with CICO 5 concurrent users

Test #	# of records	Time it takes to complete import	CI time Avg	Baseline CI Avg delta	CI time 95th pct	Baseline CI delta	CO time Avg	Baseline CO Avg Delta	CO time 95th pct	Baseline CO delta
1	1,000	14 sec	0.585	+21%	0.778	+37%	1.012	+34%	1.426	+62%
2	5,000	56 sec	0.914	+90%	1.467	+157%	1.305	+73%	2.403	+173%
3	10,000	1 min 54 sec	0.907	+89%	1.759	+209%	1.408	+86%	2.721	+209%
4	22778	4 min 32 sec	0.853	+78%	1.616	+184%	1.425	+89%	2.497	+183%
5	50,000	9 min 37 sec	0.862	+80%	1.471	+158%	1.510	+100%	2.403	+173%

Baseline	Avg	95th pct
CI	0.480	0.569
CO	0.755	0.881

Multitenant testing

test 1-5: testing DI on each tenant consecutively (5 jobs from 3 tenants = 15 test runs)
test 6-8: testing DI jobs from two tenants simultaneously with 1 min ramp-up.
test 9: testing DI jobs from 3 tenants simultaneously with 1 min ramp-up.

Test #	# of records	Tenant ptf-ncp5-00 time	Comment	Tenant ptf-ncp5-01 time	Comment	Tenant ptf-ncp5-02 time	Comment
1.	1,000	15 sec	COMMITTED	56 sec / 17 sec	1 time COMMITTED / other ERROR	13 sec - 30 min	ERROR one of the jobs stuck for 30 min*
2.	5,000	1 min	COMMITTED	58 sec	1 time COMMITTED / other ERROR	47 sec - 55 min	1 time COMMITTED / other ERROR one of the jobs stuck for 30 min
3.	10,000	2 min 02 sec	COMMITTED	1 min 36 sec	1 time COMMITTED	19 min 22 sec	ERROR
4	22778	4 min 20 sec	COMMITTED	11 min 52 sec	ERROR	-
5	50,000	9 min 53 sec	COMMITTED	3 min 56 sec	ERROR	-
6	Tenant-00 + Tenant-01 50000 recordsg	Stopped by user					MODSOURCE-581 - Getting issue details... STATUS
7	Tenant-01 + Tenant-02 50000 records	Stopped by user					MODSOURCE-581 - Getting issue details... STATUS
8	Tenant-00 + Tenant-02 50000 records	Stopped by user					MODSOURCE-581 - Getting issue details... STATUS
9	Tenant-00 +Tenant-01 + Tenant-02 50000 records	Stopped by user					MODSOURCE-581 - Getting issue details... STATUS

Jobs were always successful for tenant ptf-ncp5-00. For another 2 tenants jobs were Completed with errors where all records were discarded, sometimes one and only run for each DI file could be successful.
Jobs for 2 or 3 tenants simultaneously were tested but never finished due to an error MODSOURCE-581 - Getting issue details... STATUS occurred. As jobs were stopped by the user due to an error (about 10-15% done for 5 hours), the results are irrelevant.

Multitenant testing errors and warnings:

mod-source-record-manager

11:16:50 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 2 subscriptionPattern: SubscriptionDefinition(eventType=DI_PARSED_RECORDS_CHUNK_SAVED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_PARSED_RECORDS_CHUNK_SAVED) offset: 1947

Browser not supported