Data Import with R/W split [Nolana]

Overview

Test goal is to assess performance of Data Import functionality (Create and Update of MARC BIBs, Authorities, MARC Holdings, Create MARC BIB with CICO) with R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage. Compare results with previous ones where R/W split is enabled for all the DI modules.

Ticket:  PERF-418 - Getting issue details... STATUS


Summary

Test results showed that there is perfromance degradation for Data Import functionality with R/W split enabled on all DI and CICO modules except mod-source-record-manager and mod-inventory-storage comparing with results with R/W split enabled on all the modules. Here are main 

  • DI process is stable, there are no issues (tests 1-4, marked blue in response times table);
  • Response time degraded for 10-30% (tests 5-7, marked green in response times table);
  • DI+CICO tests showed that here are more spikes and longer response time when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage. This is due to mod-inventory-storage  being one of the key communicator with the database during CICO and DI.

It can be concluded that without mod-inventory-storage benefitting from the R/W split feature sifnificantly decreases. 

Test Runs 

Test #

Test Scenario

Load generator size (recommended)

Load generator Memory(GiB) (recommended)

Notes


1.

MARC BIB Create (1k, 2,k 5,10k, 25k, 50k, 100k)

t3.medium3ncp4 environment
2.MARC BIB Update (1k, 2k, 5k, 10k, 25k, 50k, 100k)
3.

MARC Holdings (1k, 5k, 10k, 80k)

4.

 MARC Authorities (1k, 5k, 10k, 25k, 50k)

5.MARC BIB Create + CICO 20 users (25k, 50k)
6.MARC BIB Create with Kafka partitions changed from 1 to 2/50 (50k)
7.MARC BIB Update with Kafka partitions changed from 1 to 2/50 (50k)

Results

DI Response Times

Test Scenario

ConfigurationDI data quantityResponse time
R/W split enabled on all DI modules* 

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage

MARC BIB Create







Low partitions count for Kafka topics (1 per each)





1k

0:400:43
2k0:561:02
5k2:082:28
10k4:204:42
25k10:4111:38
50k21:1122:45
100k42:3547:53
MARC BIB Update





1k0:350:40
2k0:581:32
5k2:103:37
10k4:087:02
25k10:4017:36
50k20:5740:30 (retest -27:01)
100k41:561:10:45

MARC holdings




1k0:330:30
5k4:211:46
10k3:253:08
80k21:2324:07

 MARC Authorities





1k0:270:32
5k1:151:46
10k2:313:12
25k7:077:25
50k11:2414:37

Additional tests (both baseline and verification were rerun)
MARC BIB Create (partitions changed)Correct partitions count for Kafka topics (2/50 per each)


25k08:2509:24
50k16:3319:53
MARC BIB Update (partitions changed)25k12:3513:36
50k23:2527:25

MARC BIB Create + CICO 20 users

(partitions changed)

25k09:2611:40
50k19:0021:57

*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.

CICO Response times

Test Scenario

DI data quantityTransactionConfigurationResponse time, Average s Response time, average, secDegradation (sec) from having R/W split enabled on all DI modules 

Baseline Tests (R/W split  is NOT enabled on any modules)

R/W split enabled on all CICO and DI modules

R/W split enabled on all CICO and DI modules except mod-source-record-manager and mod-inventory-storage

MARC BIB Create + CICO (20 users)


25kCheck-inCorrect partitions count for Kafka topics (2/50 per each).9060.6220.7930.171
Check-out
1.5120.9421.1860.244
50kCheck-in
.8750.5370.7430.206
Check-out
1.5491.1131.1880.075

(No data Import)

Check In
.353


Check Out
.630


This table contains the response times of check in/out workflows with and without data import, with and without all modules having R/W split enabled.  A few notable items:

  • Without any modules having R/W split enabled, there is a huge jump of CICO times when a DI job runs at the same time (from 353ms to 906ms for CI and 630ms to 1549ms for Checkout)
  • When R/W split is enabled on all modules, we see a drop in response time of nearly 300s for CI (31% improvement), 600ms for CO (38% improvement) based on the test with 25K DI Create job.
  • When R/W split is enabled on all modules except for mod-inventory-storage and mod-srm, the response time jumped back  up to nearly 800ms for CI and 1.18s for CO. We lose about 200ms of response times gain.  In totality, we lose about 26% of the gains for CI, and  16% for CO 
  • In summary, there are still some benefits although not as large with R/W split not enabled on all modules. 

MARC BIB Create: Instance CPU Utilization

MARC BIB Create: Service CPU Utilization

MARC BIB Create: Memory Utilization

MARC BIB Create: DB CPU Utilization

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage

ncp4-db-01 - read node - blue upper line

ncp4-db-02 - write node - orange lower line


R/W split enabled on all DI modules*

read node - orange upper line

write node - blue lower line


*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.

MARC BIB Create: DB connections

MARC BIB Update: Instance CPU Utilization

MARC BIB Update: Service CPU Utilization

MARC BIB Update: Memory Utilization

MARC BIB Update: DB CPU Utilization

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage

ncp4-db-01 - read node - blue upper line

ncp4-db-02 - write node - orange lower line


R/W split enabled on all DI modules*

100k file

read node - orange upper line

write node - blue lower line


*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.

MARC BIB Update: DB connections

MARC Holdings : Instance CPU Utilization

MARC Holdings: Service CPU Utilization

MARC Holdings: Memory Utilization

MARC Holdings: DB CPU Utilization

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage

ncp4-db-01 - read node - blue upper line

ncp4-db-02 - write node - orange lower line


R/W split enabled on all DI modules*

ncp3-db-02 - read node - orange upper line

ncp3-db-01 - write node - blue lower line


*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.

MARC Holdings: DB connections

MARC Authorities : Instance CPU Utilization

MARC Authorities: Service CPU Utilization

MARC Authorities: Memory Utilization

MARC Authorities: DB CPU Utilization

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage

ncp4-db-01 - read node - blue upper line

ncp4-db-02 - write node - orange lower line


R/W split enabled on all DI modules*

ncp3-db-02 - read node - orange upper line

ncp3-db-01 - write node - blue lower line


*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.

MARC Authorities: DB connections

MARC BIB Create + CICO: Response time

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage


R/W split enabled on all DI modules

According to the results, there are more spikes in and longer response time when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage. This is due to mod-inventory-storage  being one of the key communicator with the database during CICO and DI. Without mod-inventory-storage benefitting from the R/W split feature then the performance of these two workflows suffers as a whole. 

MARC BIB Create + CICO: Instance CPU Utilization

MARC BIB Create + CICO: Service CPU Utilization

MARC BIB Create + CICO: Memory Utilization


MARC BIB Create + CICO: DB CPU Utilization

R/W split enabled on all the modules except mod-source-record-manager and mod-inventory-storage


ncp4-db-01 - read node - blue upper line

ncp4-db-02 - write node - orange lower line


R/W split enabled on all the modules

ncp4-db-01 - read node - orange upper line

ncp4-db-02 - write node - blue lower line


According to the results, DB reader instance load is lower when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage.

MARC BIB Create + CICO: DB connections


Previous results

MARC BIB - Data Import test report (Nolana)#Softwareversions

MARC BIB + CICO - Data Import with Check-ins Check-outs Nolana

MARC Authorities - Data Import MARC Authorities (Nolana)#Softwareversions

MARC Holdings - Data Import Create MARC holdings records [Nolana]#Softwareversions

Appendix

Infrastructure

PTF -environment ncp4

  • 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 instances of db.r6.xlarge database instances: Writer & reader instances
  • MSK ptf-kakfa-3
    • 4 kafka.m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

Modules memory and CPU parameters:

Modules

Version

Task Definition

Running Tasks 

CPU

Memory (Soft/Hard limits)

MaxMetaspaceSize

Xmx

mod-data-import2.6.2112561844/20485121292
mod-data-import-cs1.15.222128896/1024128768

 mod-source-record-storage

5.5.21210241440/1536512908

mod-source-record-manager

3.5.62210243688/40965122048
mod-inventory19.0.22210242592/28805121814

 mod-inventory-storage

25.0.3221024

1952/2208

5121440

Methodology/Approach

  1. Enable R/W split on all DI modules except mod-source-record-manager and mod-inventory-storage.
  2. Change partitions number for topics to 50/2 (for specific tests).
  3. Run DB deletion and population scripts. then start CICO test in Jenkins (for specific tests).
  4. Conduct DI test with specific file size.

Additional information

Link to Grafana dashboard (results are being saved for limited abount of time):

R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&from=1677851276141&to=1677854700000&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_nolana&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

R/W split enabled on all DI modules

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_nolana&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1678446223529&to=1678448644316