Overview
This document contains the results of testing concurrent Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.
The purpose for this test is to define how concurrent DI affect duration of DI jobs on the central tenant and to check possible issues during smoke test with 50k DI Create job running concurrently on all 3 tenants.
Ticket: - PERF-715Getting issue details... STATUS
Summary
Data import duration approximately doubling with 10k and 25k jobs when increasing the number of concurrent jobs on different tenants. This trend is consistent across the central tenant and other tenants.
Smoke test with 50k didn't reveal some issues. Duration for 3 concurrent DI Create jobs was three times higher than one DI on central tenant that only confirm previous statement about concurrency affect.
Maximum average CPU utilization was different during create and update jobs. Top two modules during DI Create jobs in mod-inventory-b - 123%, mod-quick-marc-b - 76%, Update jobs mod-inventory-b - 182%, mod-quick-marc-b - 122%.
Memory consumption was almost the same for DI create and update jobs: Nevertheless it was slightly higher for update jobs in mod-inventory-b - 98%, mod-permission-b - 79%, mod-source-record-storage-b - 73%.
RDS CPU utilization was 97% for all Create jobs and 94% for Update jobs
DB connections were higher during DI Create jobs. With 2 tenants Create jobs - 710, for 3 tenants Create jobs - 870
Top long query for failed job on third tenant during DI Create job with 10k- SELECT jsonb,id FROM fs07000002_mod_inventory_storage.instance_holdings_item_view. Average latency- 386455.99 ms/call
Test Runs
Test # | Scenario | Load level | Comment |
---|---|---|---|
1 | DI MARC Bib Create | 10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants | |
2 | DI MARC Bib Update | 10K, 25K concurrently (with 5 min pause) on 2 and 3 tenants | |
3 | DI MARC Bib Create | 50k concurrently on 3 tenants - smoke test |
Test Results
Data import
As the number of concurrent jobs and file size grow duration of DI jobs grow proportionally.
Smoke tests finished successfully for 3 concurrent DI Create jobs with 50k.
DI Create | # of test | Number of concurrent jobs | Central tenant | Second tenant | Third tenant |
---|---|---|---|---|---|
10k | Baseline | 1 | 00:04:56 | ||
1 | 2 | 00:10:43 | 00:10:37 | ||
2 | 3 | 00:21:12 | 00:21:06 | 00:20:57 * | |
25k | Baseline | 1 | 00:11:24 | ||
3 | 2 | 00:23:44 | 00:23:30 | ||
4 | 3 | 00:37:11 | 00:37:05 | 00:36:58 | |
DI Update | |||||
10k | Baseline | 1 | 00:06:32 | ||
5 | 2 | 00:09:47 | 00:11:26 | ||
6 | 3 | 00:19:08 | 00:19:06 | 00:18:31 | |
25k | Baseline | 1 | 00:15:13 | ||
7 | 2 | 00:30:49 | 00:30:52 | ||
8 | 3 | 00:47:47 | 00:48:17 | 00:47:54 | |
DI Create (smoke test) | |||||
50k | 9 | 1 | 00:22:31 | ||
10 | 3 | 01:12:54 | 01:12:44 | 01:12:35 |
* - Errors occurred only in 10k DI Create jobs running on third tenant during 3 concurrent jobs test. The errors did not reproduce during subsequent tests.
- io.vertx.core.impl.NoStackTraceThrowable: [{"id":"cf64277b-9945-49a1-93c0-007643c46efe","error":"Timeout for DB_HOST:DB_PORT=db.pcp1.folio-eis.us-east-1:5432","holdingId":"bd17bc47-72eb-480b-8a83-e0a1bc16e0f4"}]
- java.lang.NullPointerException: Cannot invoke "org.folio.processing.mapping.defaultmapper.processor.parameters.MappingParameters.getLinkingRules()" because "mappingParameters" is null
Service CPU Utilization
DI Create jobs
DI Update jobs
Service Memory Utilization
DI Create jobs
DI Update jobs
DB CPU Utilization
RDS CPU utilization was 97% for all Create jobs and 94% for Update jobs
Create jobs
Update jobs
DB Connections
DB connections for 2 tenants Create jobs - 710, for 3 tenants Create jobs - 870
DB connections for 2 tenants Create jobs - 630, for 3 tenants Create jobs - 785
DB connections needed for every additional job processing concurrently on different tenant - 150.
Create jobs
Update jobs
DB load
Create jobs
Update jobs
Appendix
Errors & Exceptions
During successfully finished tests exceptions were observed:
Infrastructure
PTF -environment pcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, writer/reader
Name Memory GIB vCPUs max_connections db.r6g.xlarge
32 GiB 4 vCPUs 2731 - MSK tenant
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Module | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
pcp1-pvt | ||||||||||
mod-remote-storage | 10(11)* | 3.0.0 | 2 | 4920 | 4472 | 1024 | 3960 | 512 | 512 | FALSE |
mod-data-import | 18(20)* | 3.0.7 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | FALSE |
mod-authtoken | 13(16)* | 2.14.1 | 2 | 1440 | 1152 | 512 | 922 | 88 | 128 | FALSE |
mod-configuration | 9(10)* | 5.9.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
mod-users-bl | 9(10)* | 7.6.0 | 2 | 1440 | 1152 | 512 | 922 | 88 | 128 | FALSE |
mod-inventory-storage | 12(15)* | 27.0.3(27.0.4)* | 2 | 4096 | 3690 | 2048 | 3076 | 384 | 512 | FALSE |
mod-circulation-storage | 12(14)* | 17.1.3(17.1.7)* | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | FALSE |
mod-source-record-storage | 15(18)* | 5.7.3(5.7.5)* | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE |
mod-inventory | 11(14)* | 20.1.3(20.1.7)* | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | FALSE |
mod-di-converter-storage | 15(18)* | 2.1.2(2.1.5)* | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
mod-circulation | 12(14)* | 24.0.8(24.0.11)* | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | FALSE |
mod-pubsub | 11(13)* | 2.11.2(2.11.3)* | 2 | 1536 | 1440 | 1024 | 922 | 384 | 512 | FALSE |
mod-patron-blocks | 9(10)* | 1.9.0 | 2 | 1024 | 896 | 1024 | 768 | 88 | 128 | FALSE |
mod-source-record-manager | 14(17)* | 3.7.4(3.7.8)* | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE |
mod-quick-marc | 9(11)* | 5.0.0(5.0.1)* | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | FALSE |
nginx-okapi | 9 | 2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | FALSE |
okapi-b | 11 | 5.1.2 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | FALSE |
mod-feesfines | 10(11)* | 19.0.0 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
pub-okapi | 9 | 2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | FALSE |
* - The newest version was used in this test to compare with the previous test
Methodology/Approach
DI tests were started from UI with 5-minute pauses between the tests.