Overview

This document contains the results of testing Data Import for MARC Bibliographic records in the Orchid release to detect performance trends. PERF-562 - Getting issue details... STATUS

Summary

Duration for Orchid is increased almost twice due to fixing differences in the database schemas, mostly adding triggers functions. For example, 50K MARC Create took 39 min 27 sec for Orchid compared to 21 min 11 s, and 21 min 37s for Nolana and Morning Glory but for Lotus, it was 32 min 28 s. So we can assume that the trigger was missing for the previous 2 releases in our database.
Memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules due to investigation in the scope of the ticket PERF-541 - Getting issue details... STATUS .
Average CPU usage did not exceed 130 % for all modules. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 270%.
Approximately DB CPU usage is up to 97%.
We can not compare R/W split-enabled DI test to the first test from 13/06/2023 because of the different versions of modules tested. However, there are no significant improvements in DI duration for R/W split enabled if compared between new module versions. Details are in the table below and in the Data Import series of tests to understand a range of duration of each DI test (Orchid). However, R/W splitting could decrease the database load for the writer instance as a result other processes with the data import could be performed faster.

Recommendations and Jiras

Implement database schemas comparison process to each cluster deployment to avoid database misconfiguration.

Results

	Profile	Duration with new versions of modules Orchid with R/W split disabled (07/09/2023)	Duration with new versions of modules Orchid with R/W split enabled (07/09/2023)	Duration with new versions of modules Orchid with R/W split enabled (08/09/2023)	Duration Orchid (First test 13/06/2023)	Duration Nolana	Duration Morning Glory	Duration Lotus
5K MARC Create	PTF - Create 2	2 min 50 sec	2 min 23 sec	2 min 3 sec	4 min 30 sec	2m 8 s	2 min 20s	3 min 54 s
5K MARC Update	PTF - Updates Success - 1		2 min 48 sec	2 min 45 sec	4 min 2 sec	2 min 10 s	3 min 4 s	4 min 12 s
10K MARC Create	PTF - Create 2	4 min 43 sec	5 min 12 sec	3 min 58 sec	9 min 25 sec	4 min 20 s	4 min 33 s	6 min 45 s
10K MARC Update	PTF - Updates Success - 1		5 min 23 sec	5 min 23 sec	8 min 10 sec	4 min 8 s	5 min 29 s	8 min 4 s
25K MARC Create	PTF - Create 2	11 min 52 sec	11 min 45 sec	10 min 5 sec	22 min 16 sec	10 min 41 s	10 min 55 s	16 min 8s
25K MARC Update	PTF - Updates Success - 1		14 min 12 sec	14 min 19 sec	19 min 39 sec	10 min 40 s	13 min 37 s	19 min 50s
50K MARC Create	PTF - Create 2	23 min 20 sec	23 min 36 sec	20 min 46 sec	39 min 27 sec	21 min 11 s	21 min 37 s	32 min 28 s
50K MARC Update	PTF - Updates Success - 1		27 min 52 sec	28 min	38 min 30 sec Completed or Completed with errors (1 item discarded) *	20 min 57 s	26 min 10 s	39 min 5 s
100K MARC Create	PTF - Create 2	48 min 46 sec	49 min 28 sec	44 min 18 sec	1 hour 38 min	42 min 35 s	44 min 4 s	1 hr 11 min
100K MARC Update	PTF - Updates Success - 1		57 min 41 sec	55 min	1 hour 33 min	41 min 56 s	55 min 33 s	1 hr 19 min

Orchid with R/W split enabled (07/09/2023) enabled for:

mod-data-import
mod-source-record-storage
mod-source-record-manager
mod-di-converter-storage

Orchid with R/W split enabled (08/09/2023) enabled for:

mod-data-import
mod-source-record-storage
mod-source-record-manager
mod-di-converter-storage
mod-inventory-storage

There 2 modules that affect R/W split the most due to the number of read queries to the database:

mod-inventory-storage (mostly)
mod-source-record-manager

All other modules almost or at all do not have read queries to the database.

* - io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Item update for jobExecutionId '8fde78a8-2450-44c7-83ac-c98376a90491'

From ncp5/mod-inventory/

11:34:53 [] [] [] [] WARN dateItemEventHandler OL error updating Item - ERROR: Cannot update record 955f8ba1-76ac-4931-8236-1c2fb4775379 because it has been changed (optimistic locking): Stored _version is 5, _version of request is 3 (23F09), status code 409. Retry UpdateItemEventHandler handler...

Memory Utilization

This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules due to investigation in the scope of the ticket PERF-541 - Getting issue details... STATUS .

MARC BIB CREATE

MARC BIB UPDATE

Service CPU Utilization

MARC BIB CREATE

Average CPU usage did not exceed 130 % for all modules. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 270%.

MARC BIB UPDATE

RDS CPU Utilization

MARC BIB CREATE

Approximately DB CPU usage is up to 97%

MARC BIB UPDATE

CPU Utilization for BIB Update 100k records R/W split enabled

RDS Database Connections

MARC BIB CREATE
For DI job Create- Maximum 560 connections count.

MARC BIB UPDATE

For DI job Update-- Maximum 500 connections count.

Average connection usage

Maximum connection usage

BIB Update 100k records R/W split enabled

From reader DB instance

From writer DB instance

Appendix

Infrastructure

Records count :

mod_source_record_storage.marc_records_lb = 22618121
mod_source_record_storage.raw_records_lb = 22650140
mod_source_record_storage.records_lb = 22650140
mod_source_record_storage.marc_indexers = 98256911(all records)
mod_source_record_storage.marc_indexers with field_no 010 = 139135
mod_source_record_storage.marc_indexers with field_no 035 = 4272473
mod_inventory_storage.authority = 7402975
mod_inventory_storage.holdings_record = 22027125
mod_inventory_storage.instance = 20986866
mod_inventory_storage.item = 22130108

PTF -environment ncp5

9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections
R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731
MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Kafka topics partitioning: - 2 partitions for DI topics

Modules memory and CPU parameters

Modules	Version	Task Definition	Running Tasks	CPU	Memory	MemoryReservation	MaxMetaspaceSize	Xmx
mod-inventory-storage	26.0.0	10	2	1024	2208	1952	512	1440
mod-inventory	20.0.4	8	2	1024	2880	2592	512	1814
mod-source-record-storage	5.6.5	24	2	2048	4096	3688	512	3076
mod-quick-marc	3.0.0	5	1	128	2288	2176	512	1664
mod-source-record-manager	3.6.2	16	2	1024	4096	3688	512	3076
mod-di-converter-storage	2.0.2	5	2	128	1024	896	128	768
mod-data-import	2.7.1	8	1	256	2048	1844	512	1292
okapi	5.0.1	6	3	1024	1684	1440	512	922
nginx-okapi	2022.03.02	6	2	128	1024	896	-	-
pub-okapi	2022.03.02	6	2	128	1024	896	-	768

Methodology/Approach

To test Baseline for DI JMeter scripts were used.

5 min pauses between the tests

Folio Development Teams

Data Import test report (Orchid)

Overview

Summary

Recommendations and Jiras

Results

Memory Utilization

Service CPU Utilization

RDS CPU Utilization

RDS Database Connections

BIB Update 100k records R/W split enabled

Appendix

Infrastructure

Methodology/Approach

Name	API Name	Memory GIB	vCPUs	max_connections
R6G Extra Large	db.r6g.xlarge	32 GiB	4 vCPUs	2731