Data Import with R/W split [Nolana]
Overview
Test goal is to assess performance of Data Import functionality (Create and Update of MARC BIBs, Authorities, MARC Holdings, Create MARC BIB with CICO) with R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage. Compare results with previous ones where R/W split is enabled for all the DI modules.
Ticket: - PERF-418Getting issue details... STATUS
Summary
Test results showed that there is perfromance degradation for Data Import functionality with R/W split enabled on all DI and CICO modules except mod-source-record-manager and mod-inventory-storage comparing with results with R/W split enabled on all the modules. Here are main
- DI process is stable, there are no issues (tests 1-4, marked blue in response times table);
- Response time degraded for 10-30% (tests 5-7, marked green in response times table);
- DI+CICO tests showed that here are more spikes and longer response time when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage. This is due to mod-inventory-storage being one of the key communicator with the database during CICO and DI.
It can be concluded that without mod-inventory-storage benefitting from the R/W split feature sifnificantly decreases.
Test Runs
Test # | Test Scenario | Load generator size (recommended) | Load generator Memory(GiB) (recommended) | Notes |
---|---|---|---|---|
1. | MARC BIB Create (1k, 2,k 5,10k, 25k, 50k, 100k) | t3.medium | 3 | ncp4 environment |
2. | MARC BIB Update (1k, 2k, 5k, 10k, 25k, 50k, 100k) | |||
3. | ||||
4. | ||||
5. | MARC BIB Create + CICO 20 users (25k, 50k) | |||
6. | MARC BIB Create with Kafka partitions changed from 1 to 2/50 (50k) | |||
7. | MARC BIB Update with Kafka partitions changed from 1 to 2/50 (50k) |
Results
DI Response Times
Test Scenario | Configuration | DI data quantity | Response time | |
R/W split enabled on all DI modules* | R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage | |||
---|---|---|---|---|
MARC BIB Create | Low partitions count for Kafka topics (1 per each) | 1k | 0:40 | 0:43 |
2k | 0:56 | 1:02 | ||
5k | 2:08 | 2:28 | ||
10k | 4:20 | 4:42 | ||
25k | 10:41 | 11:38 | ||
50k | 21:11 | 22:45 | ||
100k | 42:35 | 47:53 | ||
MARC BIB Update | 1k | 0:35 | 0:40 | |
2k | 0:58 | 1:32 | ||
5k | 2:10 | 3:37 | ||
10k | 4:08 | 7:02 | ||
25k | 10:40 | 17:36 | ||
50k | 20:57 | 40:30 (retest -27:01) | ||
100k | 41:56 | 1:10:45 | ||
1k | 0:33 | 0:30 | ||
5k | 4:21 | 1:46 | ||
10k | 3:25 | 3:08 | ||
80k | 21:23 | 24:07 | ||
1k | 0:27 | 0:32 | ||
5k | 1:15 | 1:46 | ||
10k | 2:31 | 3:12 | ||
25k | 7:07 | 7:25 | ||
50k | 11:24 | 14:37 | ||
Additional tests (both baseline and verification were rerun) | ||||
MARC BIB Create (partitions changed) | Correct partitions count for Kafka topics (2/50 per each) | 25k | 08:25 | 09:24 |
50k | 16:33 | 19:53 | ||
MARC BIB Update (partitions changed) | 25k | 12:35 | 13:36 | |
50k | 23:25 | 27:25 | ||
MARC BIB Create + CICO 20 users (partitions changed) | 25k | 09:26 | 11:40 | |
50k | 19:00 | 21:57 |
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
CICO Response times
Test Scenario | DI data quantity | Transaction | Configuration | Response time, Average s | Response time, average, sec | Degradation (sec) from having R/W split enabled on all DI modules | |
Baseline Tests (R/W split is NOT enabled on any modules) | R/W split enabled on all CICO and DI modules | R/W split enabled on all CICO and DI modules except mod-source-record-manager and mod-inventory-storage | |||||
---|---|---|---|---|---|---|---|
MARC BIB Create + CICO (20 users) | 25k | Check-in | Correct partitions count for Kafka topics (2/50 per each) | .906 | 0.622 | 0.793 | 0.171 |
Check-out | 1.512 | 0.942 | 1.186 | 0.244 | |||
50k | Check-in | .875 | 0.537 | 0.743 | 0.206 | ||
Check-out | 1.549 | 1.113 | 1.188 | 0.075 | |||
(No data Import) | Check In | .353 | |||||
Check Out | .630 |
This table contains the response times of check in/out workflows with and without data import, with and without all modules having R/W split enabled. A few notable items:
- Without any modules having R/W split enabled, there is a huge jump of CICO times when a DI job runs at the same time (from 353ms to 906ms for CI and 630ms to 1549ms for Checkout)
- When R/W split is enabled on all modules, we see a drop in response time of nearly 300s for CI (31% improvement), 600ms for CO (38% improvement) based on the test with 25K DI Create job.
- When R/W split is enabled on all modules except for mod-inventory-storage and mod-srm, the response time jumped back up to nearly 800ms for CI and 1.18s for CO. We lose about 200ms of response times gain. In totality, we lose about 26% of the gains for CI, and 16% for CO
- In summary, there are still some benefits although not as large with R/W split not enabled on all modules.
MARC BIB Create: Instance CPU Utilization
MARC BIB Create: Service CPU Utilization
MARC BIB Create: Memory Utilization
MARC BIB Create: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
read node - orange upper line
write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC BIB Create: DB connections
MARC BIB Update: Instance CPU Utilization
MARC BIB Update: Service CPU Utilization
MARC BIB Update: Memory Utilization
MARC BIB Update: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
100k file
read node - orange upper line
write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC BIB Update: DB connections
MARC Holdings : Instance CPU Utilization
MARC Holdings: Service CPU Utilization
MARC Holdings: Memory Utilization
MARC Holdings: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
ncp3-db-02 - read node - orange upper line
ncp3-db-01 - write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC Holdings: DB connections
MARC Authorities : Instance CPU Utilization
MARC Authorities: Service CPU Utilization
MARC Authorities: Memory Utilization
MARC Authorities: DB CPU Utilization
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all DI modules*
ncp3-db-02 - read node - orange upper line
ncp3-db-01 - write node - blue lower line
*Results with R/W split enabled on all DI modules are taken from previous test report. Links to the reports can be found in Previous results section.
MARC Authorities: DB connections
MARC BIB Create + CICO: Response time
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
R/W split enabled on all DI modules
According to the results, there are more spikes in and longer response time when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage. This is due to mod-inventory-storage being one of the key communicator with the database during CICO and DI. Without mod-inventory-storage benefitting from the R/W split feature then the performance of these two workflows suffers as a whole.
MARC BIB Create + CICO: Instance CPU Utilization
MARC BIB Create + CICO: Service CPU Utilization
MARC BIB Create + CICO: Memory Utilization
MARC BIB Create + CICO: DB CPU Utilization
R/W split enabled on all the modules except mod-source-record-manager and mod-inventory-storage
ncp4-db-01 - read node - blue upper line
ncp4-db-02 - write node - orange lower line
R/W split enabled on all the modules
ncp4-db-01 - read node - orange upper line
ncp4-db-02 - write node - blue lower line
According to the results, DB reader instance load is lower when R/W split is enabled on all the modules except mod-source-record-manager and mod-inventory-storage.
MARC BIB Create + CICO: DB connections
Previous results
MARC BIB - Data Import test report (Nolana)#Softwareversions
MARC BIB + CICO - Data Import with Check-ins Check-outs Nolana
MARC Authorities - Data Import MARC Authorities (Nolana)#Softwareversions
MARC Holdings - Data Import Create MARC holdings records [Nolana]#Softwareversions
Appendix
Infrastructure
PTF -environment ncp4
- 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 2 instances of db.r6.xlarge database instances: Writer & reader instances
- MSK ptf-kakfa-3
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Modules memory and CPU parameters:
Modules | Version | Task Definition | Running Tasks | CPU | Memory (Soft/Hard limits) | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|
mod-data-import | 2.6.2 | 1 | 1 | 256 | 1844/2048 | 512 | 1292 |
mod-data-import-cs | 1.15.2 | 2 | 2 | 128 | 896/1024 | 128 | 768 |
mod-source-record-storage | 5.5.2 | 1 | 2 | 1024 | 1440/1536 | 512 | 908 |
mod-source-record-manager | 3.5.6 | 2 | 2 | 1024 | 3688/4096 | 512 | 2048 |
mod-inventory | 19.0.2 | 2 | 2 | 1024 | 2592/2880 | 512 | 1814 |
mod-inventory-storage | 25.0.3 | 2 | 2 | 1024 | 1952/2208 | 512 | 1440 |
Methodology/Approach
- Enable R/W split on all DI modules except mod-source-record-manager and mod-inventory-storage.
- Change partitions number for topics to 50/2 (for specific tests).
- Run DB deletion and population scripts. then start CICO test in Jenkins (for specific tests).
- Conduct DI test with specific file size.
Additional information
Link to Grafana dashboard (results are being saved for limited abount of time):
R/W split enabled on all DI modules except mod-source-record-manager and mod-inventory-storage
R/W split enabled on all DI modules