Data Import with R/W split [Nolana]
- 1 Overview
- 2 Summary
- 3 Test Runs
- 4 Results
- 4.1 DI Response Times
- 4.2 CICO Response times
- 4.3 MARC BIB Create: Instance CPU Utilization
- 4.4 MARC BIB Create: Service CPU Utilization
- 4.5 MARC BIB Create: Memory Utilization
- 4.6 MARC BIB Create: DB CPU Utilization
- 4.7 MARC BIB Create: DB connections
- 4.8 MARC BIB Update: Instance CPU Utilization
- 4.9 MARC BIB Update: Service CPU Utilization
- 4.10 MARC BIB Update: Memory Utilization
- 4.11 MARC BIB Update: DB CPU Utilization
- 4.12 MARC BIB Update: DB connections
- 4.13 MARC Holdings : Instance CPU Utilization
- 4.14 MARC Holdings: Service CPU Utilization
- 4.15 MARC Holdings: Memory Utilization
- 4.16 MARC Holdings: DB CPU Utilization
- 4.17 MARC Holdings: DB connections
- 4.18 MARC Authorities : Instance CPU Utilization
- 4.19 MARC Authorities: Service CPU Utilization
- 4.20 MARC Authorities: Memory Utilization
- 4.21 MARC Authorities: DB CPU Utilization
- 4.22 MARC Authorities: DB connections
- 4.23 MARC BIB Create + CICO: Response time
- 4.24 MARC BIB Create + CICO: Instance CPU Utilization
- 4.25 MARC BIB Create + CICO: Service CPU Utilization
- 4.26 MARC BIB Create + CICO: Memory Utilization
- 4.27 MARC BIB Create + CICO: DB CPU Utilization
- 4.28 MARC BIB Create + CICO: DB connections
- 5 Previous results
- 6 Appendix
Overview
Test goal is to assess performance of Data Import functionality (Create and Update of MARC BIBs, Authorities, MARC Holdings, Create MARC BIB with CICO) with R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage. Compare results with previous ones where R/W split is enabled for all the DI modules.
Ticket: PERF-418: Retest DI with R/W split for all DI modules but mod-srm and mod-inventory-storageClosed
Summary
Test results showed that there is perfromance degradation for Data Import functionality with R/W split enabled on all DI and CICO modules except mod-source-record-manager and mod-inventory-storage comparing with results with R/W split enabled on all the modules. Here are main
DI process is stable, there are no issues (tests 1-4, marked blue in response times table);
Response time degraded for 10-30% (tests 5-7, marked green in response times table);
DI+CICO tests showed that here are more spikes and longer response time when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage. This is due to mod-inventory-storage being one of the key communicator with the database during CICO and DI.
It can be concluded that without mod-inventory-storage benefitting from the R/W split feature sifnificantly decreases.
Test Runs
Test # | Test Scenario | Load generator size (recommended) | Load generator Memory(GiB) (recommended) | Notes
|
|---|---|---|---|---|
1. | MARC BIB Create (1k, 2,k 5,10k, 25k, 50k, 100k) | t3.medium | 3 | ncp4 environment |
2. | MARC BIB Update (1k, 2k, 5k, 10k, 25k, 50k, 100k) | |||
3. | MARC Holdings (1k, 5k, 10k, 80k) | |||
4. | MARC Authorities (1k, 5k, 10k, 25k, 50k) | |||
5. | MARC BIB Create + CICO 20 users (25k, 50k) | |||
6. | MARC BIB Create with Kafka partitions changed from 1 to 2/50 (50k) | |||
7. | MARC BIB Update with Kafka partitions changed from 1 to 2/50 (50k) |
Results
DI Response Times
| 1 | Test Scenario | Configuration | DI data quantity | Response time | |
| 2 | R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage | ||||
|---|---|---|---|---|---|
| 3 | MARC BIB Create | Low partitions count for Kafka topics (1 per each)
| 1k | 0:40 | 0:43 |
| 4 | 2k | 0:56 | 1:02 | ||
| 5 | 5k | 2:08 | 2:28 | ||
| 6 | 10k | 4:20 | 4:42 | ||
| 7 | 25k | 10:41 | 11:38 | ||
| 8 | 50k | 21:11 | 22:45 | ||
| 9 | 100k | 42:35 | 47:53 | ||
| 10 | MARC BIB Update | 1k | 0:35 | 0:40 | |
| 11 | 2k | 0:58 | 1:32 | ||
| 12 | 5k | 2:10 | 3:37 | ||
| 13 | 10k | 4:08 | 7:02 | ||
| 14 | 25k | 10:40 | 17:36 | ||
| 15 | 50k | 20:57 | 40:30 (retest -27:01) | ||
| 16 | 100k | 41:56 | 1:10:45 | ||
| 17 | MARC holdings | 1k | 0:33 | 0:30 | |
| 18 | 5k | 4:21 | 1:46 | ||
| 19 | 10k | 3:25 | 3:08 | ||
| 20 | 80k | 21:23 | 24:07 | ||
| 21 | MARC Authorities | 1k | 0:27 | 0:32 | |
| 22 | 5k | 1:15 | 1:46 | ||
| 23 | 10k | 2:31 | 3:12 | ||
| 24 | 25k | 7:07 | 7:25 | ||
| 25 | 50k | 11:24 | 14:37 | ||
| 26 |
| Additional tests (both baseline and verification were rerun) | |||
| 27 | MARC BIB Create (partitions changed) | Correct partitions count for Kafka topics (2/50 per each)
| 25k | 08:25 | 09:24 |
| 28 | 50k | 16:33 | 19:53 | ||
| 29 | MARC BIB Update (partitions changed) | 25k | 12:35 | 13:36 | |
| 30 | 50k | 23:25 | 27:25 | ||
| 31 | MARC BIB Create + CICO 20 users (partitions changed) | 25k | 09:26 | 11:40 | |
| 32 | 50k | 19:00 | 21:57 | ||
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
CICO Response times
Test Scenario | DI data quantity | Transaction | Configuration | Response time, Average s | Response time, average, sec | Degradation (sec) from having R/W split enabled on all DI modules | |
Baseline Tests (R/W split is NOT enabled on any modules) | R/W split enabled on all CICO and DI modules | R/W split enabled on all CICO and DI modules except mod-source-record-manager and mod-inventory-storage | |||||
|---|---|---|---|---|---|---|---|
MARC BIB Create + CICO (20 users)
| 25k | Check-in | Correct partitions count for Kafka topics (2/50 per each) | .906 | 0.622 | 0.793 | 0.171 |
Check-out |
| 1.512 | 0.942 | 1.186 | 0.244 | ||
50k | Check-in |
| .875 | 0.537 | 0.743 | 0.206 | |
Check-out |
| 1.549 | 1.113 | 1.188 | 0.075 | ||
(No data Import) | Check In |
| .353 |
|
|
| |
Check Out |
| .630 |
|
|
| ||
This table contains the response times of check in/out workflows with and without data import, with and without all modules having R/W split enabled. A few notable items:
Without any modules having R/W split enabled, there is a huge jump of CICO times when a DI job runs at the same time (from 353ms to 906ms for CI and 630ms to 1549ms for Checkout)
When R/W split is enabled on all modules, we see a drop in response time of nearly 300s for CI (31% improvement), 600ms for CO (38% improvement) based on the test with 25K DI Create job.
When R/W split is enabled on all modules except for mod-inventory-storage and mod-srm, the response time jumped back up to nearly 800ms for CI and 1.18s for CO. We lose about 200ms of response times gain. In totality, we lose about 26% of the gains for CI, and 16% for CO
In summary, there are still some benefits although not as large with R/W split not enabled on all modules.
MARC BIB Create: Instance CPU Utilization
MARC BIB Create: Service CPU Utilization
MARC BIB Create: Memory Utilization
MARC BIB Create: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
read node - orange upper line
write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC BIB Create: DB connections
MARC BIB Update: Instance CPU Utilization
MARC BIB Update: Service CPU Utilization
MARC BIB Update: Memory Utilization
MARC BIB Update: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
100k file
read node - orange upper line
write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC BIB Update: DB connections
MARC Holdings : Instance CPU Utilization
MARC Holdings: Service CPU Utilization
MARC Holdings: Memory Utilization
MARC Holdings: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
ncp3-db-02 - read node - orange upper line
ncp3-db-01 - write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC Holdings: DB connections
MARC Authorities : Instance CPU Utilization
MARC Authorities: Service CPU Utilization
MARC Authorities: Memory Utilization
MARC Authorities: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
ncp3-db-02 - read node - orange upper line
ncp3-db-01 - write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC Authorities: DB connections
MARC BIB Create + CICO: Response time
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
R/W split enabled on all DI modules
According to the results, there are more spikes in and longer response time when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage. This is due to mod-inventory-storage being one of the key communicator with the database during CICO and DI. Without mod-inventory-storage benefitting from the R/W split feature then the performance of these two workflows suffers as a whole.
MARC BIB Create + CICO: Instance CPU Utilization
MARC BIB Create + CICO: Service CPU Utilization
MARC BIB Create + CICO: Memory Utilization
MARC BIB Create + CICO: DB CPU Utilization
R/W split enabled on all the modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all the modules
ncp4-db-01 - read node - orange upper line
ncp4-db-02 - write node - blue lower line
According to the results, DB reader instance load is lower when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage.
MARC BIB Create + CICO: DB connections
Previous results
MARC BIB - Data Import test report (Nolana)#Softwareversions
MARC BIB + CICO - Data Import with Check-ins Check-outs Nolana
MARC Authorities - Data Import MARC Authorities (Nolana)#Softwareversions
MARC Holdings - Data Import Create MARC holdings records [Nolana]#Softwareversions
Appendix
Infrastructure
PTF -environment ncp4
8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 instances of db.r6.xlarge database instances: Writer & reader instances
MSK ptf-kakfa-3
4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Modules memory and CPU parameters:
Modules | Version | Task Definition | Running Tasks | CPU |
|---|