Data Import test report (Orchid)


Overview

This document contains the results of testing Data Import for MARC Bibliographic records in the Orchid release to detect performance trends. PERF-562 - Getting issue details... STATUS

Summary

  • Duration for Orchid is increased almost twice due to fixing differences in the database schemas, mostly adding triggers functions. For example, 50K MARC Create took 39 min 27 sec for Orchid compared to 21 min 11 s, and 21 min 37s for Nolana and Morning Glory but for Lotus, it was 32 min 28 s. So we can assume that the trigger was missing for the previous 2 releases in our database.
  • Memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules due to investigation in the scope of the ticket PERF-541 - Getting issue details... STATUS .
  • Average CPU usage did not exceed 130 % for all modules. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 270%.
  • Approximately DB CPU usage is up to 97%.
  •  We can not compare R/W split-enabled DI test to the first test from 13/06/2023 because of the different versions of modules tested. However, there are no significant improvements in DI duration for R/W split enabled if compared between new module versions. Details are in the table below and in the Data Import series of tests to understand a range of duration of each DI test (Orchid). However, R/W splitting could decrease the database load for the writer instance as a result other processes with the data import could be performed faster.

Recommendations and Jiras


Implement database schemas comparison process to each cluster deployment to avoid database misconfiguration.

Results


Profile

Duration with new versions of modules

Orchid with R/W split disabled (07/09/2023)

Duration with new versions of modules

Orchid with R/W split enabled (07/09/2023)

Duration with new versions of modules

Orchid with R/W split enabled (08/09/2023)

Duration

Orchid (First test 13/06/2023)

Duration

Nolana

Duration

Morning Glory

Duration Lotus

5K MARC Create

PTF - Create 22 min 50 sec2 min 23 sec2 min 3 sec4 min 30 sec2m 8 s2 min 20s

3 min 54 s

5K MARC UpdatePTF - Updates Success - 1
2 min 48 sec2 min 45 sec4 min 2 sec2 min 10 s3 min 4 s4 min 12 s

10K MARC Create 

PTF - Create 24 min 43 sec5 min 12 sec3 min 58 sec9 min 25 sec4 min 20 s4 min 33 s

6 min 45 s

10K MARC Update PTF - Updates Success - 1
5 min 23 sec5 min 23 sec8 min 10 sec4 min 8 s5 min 29 s8 min 4 s
25K MARC CreatePTF - Create 211 min 52 sec11 min 45 sec10 min 5 sec22 min 16 sec10 min 41 s10 min 55 s16 min 8s
25K MARC UpdatePTF - Updates Success - 1
14 min 12 sec14 min 19 sec19 min 39 sec10 min 40 s13 min 37 s19 min 50s
50K MARC CreatePTF - Create 223 min 20 sec23 min 36 sec20 min 46 sec39 min 27 sec21 min 11 s21 min 37 s32 min 28 s

50K MARC Update

PTF - Updates Success - 1
27 min 52 sec28 min38 min 30 sec Completed or Completed with errors (1 item discarded) *20 min 57 s26 min 10 s39 min 5 s
100K MARC CreatePTF - Create 248 min 46 sec49 min 28 sec44  min 18 sec1 hour 38 min42 min 35 s44 min 4 s1 hr 11 min

100K MARC Update

PTF - Updates Success - 1
57 min 41 sec55 min1 hour 33 min41 min 56 s55 min 33 s1 hr 19 min

Orchid with R/W split enabled (07/09/2023) enabled for:

  • mod-data-import
  • mod-source-record-storage
  • mod-source-record-manager
  • mod-di-converter-storage

Orchid with R/W split enabled (08/09/2023) enabled for:

  • mod-data-import
  • mod-source-record-storage
  • mod-source-record-manager
  • mod-di-converter-storage
  • mod-inventory-storage

There 2 modules that affect R/W split the most due to the number of read queries to the database:

  • mod-inventory-storage (mostly)
  • mod-source-record-manager

All other modules almost or at all do not have read queries to the database.



 * - io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Item update for jobExecutionId '8fde78a8-2450-44c7-83ac-c98376a90491'

From ncp5/mod-inventory/

11:34:53 [] [] [] [] WARN  dateItemEventHandler OL error updating Item - ERROR: Cannot update record 955f8ba1-76ac-4931-8236-1c2fb4775379 because it has been changed (optimistic locking): Stored _version is 5, _version of request is 3 (23F09), status code 409. Retry UpdateItemEventHandler handler...

Memory Utilization

This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules due to investigation in the scope of the ticket PERF-541 - Getting issue details... STATUS .

MARC BIB CREATE

MARC BIB UPDATE


Service CPU Utilization 

MARC BIB CREATE

Average CPU usage did not exceed 130 % for all modules. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 270%.


MARC BIB UPDATE


RDS CPU Utilization 

MARC BIB CREATE

Approximately DB CPU usage is up to 97%

MARC BIB UPDATE

CPU Utilization for BIB Update 100k records R/W split enabled


RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 560 connections count.

MARC BIB UPDATE

For DI  job Update-- Maximum 500 connections count.

Average connection usage

Maximum connection usage

BIB Update 100k records R/W split enabled

BIB Update 100k records R/W split enabled

From reader DB instance

From writer DB instance


Appendix

Infrastructure

Records count :

  • mod_source_record_storage.marc_records_lb = 22618121
  • mod_source_record_storage.raw_records_lb = 22650140
  • mod_source_record_storage.records_lb = 22650140
  • mod_source_record_storage.marc_indexers =  98256911(all records)
  • mod_source_record_storage.marc_indexers with field_no 010 = 139135
  • mod_source_record_storage.marc_indexers with field_no 035 = 4272473
  • mod_inventory_storage.authority = 7402975
  • mod_inventory_storage.holdings_record = 22027125
  • mod_inventory_storage.instance = 20986866
  • mod_inventory_storage.item = 22130108

PTF -environment ncp5 

  • m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics


Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

mod-inventory-storage26.0.0102102422081952
512
1440
mod-inventory20.0.4821024288025925121814
mod-source-record-storage5.6.52422048409636885123076
mod-quick-marc3.0.051128228821765121664
mod-source-record-manager3.6.2162
1024
4096
3688
5123076
mod-di-converter-storage2.0.2521281024896128768
mod-data-import2.7.181256204818445121292
okapi5.0.163102416841440512922
nginx-okapi2022.03.02621281024896--
pub-okapi2022.03.02621281024896-768

Methodology/Approach

To test Baseline for DI JMeter scripts were used.

  • 5 min pauses between the tests