Data Import in Central and Member Tenants
Overview
This document contains the results of testing Data Import in central and member tenants with ECS and DI file splitting feature enabled in the Poppy release.
Tickets: - PERF-753Getting issue details... STATUS and - PERF-754Getting issue details... STATUS
Summary
- DI create and update jobs have a better performance compared with test results without ECS enabled on Poppy release. More details in comparison table.
- DI Create job results in central tenant and member tenant do not differ much for from 1k to 50k. DI with 100k records file worked faster in central tenant.
- DI Update jobs perform faster in central tenant and for 100k records file it was 5 minutes faster than in member tenant.
- Service CPU utilization of mod-inventory in central tenant didn't exceed 130% during DI Update jobs. In member tenant it was 160%.
- CPU utilization decreased for major of modules in both central and member tenants if to compare with non-ECS results. The only module which shows insignificant growth is mod-inventory-storage (+6%).
- After an additional set of 100k DI create jobs on pcon (to populate DB with instances on member tenants) it was observed a growth trend of CPU utilization with subsequent Out Of Memory issue for mod-inventory after spike 361%.
- Memory consumption for mod-inventory was 100% in average for both tenants.
- Memory consumption increased for mod-source-record-storage (20%) and mod-source-record-manager (10%) in both central and member tenants if to compare with non-ECS results. It decreased for data-import module (15%).
- RDS utilized 98% with DI MARC Bib Create jobs and 94% with DI MARC Bib Update jobs in central tenant and 95% with DI MARC Bib Create jobs and 90% with DI MARC Bib Update jobs in member tenant.
OpenSearch Service CPU utilization - 97% and Memory consumption - 99%.
Recommendations and Jiras
- After an additional set of 100k DI create jobs on pcon (to populate DB with instances on member tenants) it was observed a growth trend of CPU utilization with subsequent Out Of Memory issue for mod-inventory after spike 361%.
- Jira ticket was created - PERF-764Getting issue details... STATUS and investigated. Results of heap dump analysis attached as a comment.
- Consider more CPU units allocation to mode-data-import taking into account that during DI Update jobs CPU utilization for mod-data-import module exceed 100%.
Test Runs
Test # | Scenario | Load level | Comment |
---|---|---|---|
1 | DI MARC Bib Create | 1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause) | Central tenant only |
2 | DI MARC Bib Update | 1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause) | |
3 | DI MARC Bib Create | 1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause) | Member tenant only |
4 | DI MARC Bib Update | 1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause) |
Test Results
Profile | MARC File | DI Duration (hh:mm:ss) | |
Central tenant | Member tenant | ||
DI MARC Bib Create (PTF - Create 2) | 1K.mrc | 00:00:38 | 00:00:33 |
5K.mrc | 00:02:13 | 00:02:02 | |
10K.mrc | 00:03:54 | 00:03:54 | |
25K.mrc | 00:09:44 | 00:10:03 | |
50K.mrc | 00:18:49 | 00:18:50 | |
100K.mrc | 00:37:46 | 00:39:33 | |
DI MARC Bib Update (PTF - Updates Success - 1) | 1K.mrc | 00:00:44 | 00:00:33 |
5K.mrc | 00:02:26 | 00:02:39 | |
10K.mrc | 00:04:57 | 00:05:20 | |
25K.mrc | 00:12:05 | 00:13:21 | |
50K.mrc | 00:24:27 | 00:26:43 | |
100K.mrc | 00:49:15 | 00:54:29 |
Comparison Table
Profile | MARC File | DI Duration (hh:mm:ss) | |||
pcon | pcp1 | pcon/pcp1 | |||
Central tenant | Member tenant | Central tenant | Delta, central tenant | ||
DI MARC Bib Create (PTF - Create 2) | 1K.mrc | 00:00:38 | 00:00:33 | 00:00:39 | 00:00:01 |
5K.mrc | 00:02:13 | 00:02:02 | 00:02:39 | 00:00:26 | |
10K.mrc | 00:03:54 | 00:03:54 | 00:05:00 | 00:01:06 | |
25K.mrc | 00:09:44 | 00:10:03 | 00:11:15 | 00:01:31 | |
50K.mrc | 00:18:49 | 00:18:50 | 00:22:16 | 00:03:27 | |
100K.mrc | 00:37:46 | 00:39:33 | 00:49:58 | 00:12:12 | |
DI MARC Bib Update (PTF - Updates Success - 1) | 1K.mrc | 00:00:44 | 00:00:33 | 00:00:34 | 00:00:10 |
5K.mrc | 00:02:26 | 00:02:39 | 00:02:28 | 00:00:02 | |
10K.mrc | 00:04:57 | 00:05:20 | 00:05:31 | 00:00:34 | |
25K.mrc | 00:12:05 | 00:13:21 | 00:14:50 | 00:02:45 | |
50K.mrc | 00:24:27 | 00:26:43 | 00:32:53 | 00:08:26 | |
100K.mrc | 00:49:15 | 00:54:29 | 01:14:39 | 00:25:24 |
* - the results of DI without Check-in/Check-out in Poppy release were taken from the report Data Import with Check-ins Check-outs (Poppy)
Service CPU Utilization
CPU utilization decreased for major of modules in both central and member tenants if to compare with non-ECS tests results. The only module which shows insignificant growth is mod-inventory-storage (+ 6%).
Module | Central tenant | Member tenant | ||||||
Create Jobs | Update Jobs | Create Jobs | Update Jobs | |||||
ECS | Non-ECS | ECS | Non-ECS | ECS | Non-ECS | ECS | Non-ECS | |
mod-inventory-b | 101% | 125% | 132% | 220% | 112% | 125% | 158% | 220% |
mod-inventory-storage-b | 31% | 25% | 36% | 25% | 36% | 25% | 32% | 25% |
mod-source-record-storage-b | 52% | 60% | 36% | 50% | 48% | 60% | 33% | 50% |
mod-source-record-manager-b | 29% | 35% | 22% | 45% | 29% | 35% | 20% | 45% |
mod-di-converter-storage-b | 68% | 80% | 68% | 90% | 56% | 80% | 52% | 90% |
mod-data-import | 135% | 200% | 254% | 96% 25k file | 179% | 200% | 161% | 96% 25k file |
This table provides Average CPU utilization in ECS and Non-ECS test results in 100k records file.
Central tenant
During create jobs the highest cpu utilization was with 100k record file by mod-inventory - 99%. During update jobs mod-inventory module utilized 130% with 100k record file. The spikes were observed in mod-data-import at the very beginning of each job that was expected. The highest spike was in update job with 100k records file - 250%.
Create jobs:
The highest 101% of resource utilization was observed for mod-inventory. And at the end of the test we see that mod-quick-marc-b (98%) module begin to consume more than than other modules. Such behaviour was observed for all create jobs.
Average for mod-inventory-b - 101%, mod-inventory-storage-b - 31%, mod-source-record-storage-b - 52%, mod-source-record-manager-b - 29%, mod-di-converter-storage-b - 68%, , mod-data-import - 135% spike for 100k job.
Non-ECS tests*: Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job and mod-data-import - 86% spike for 25k job.
Update jobs:
The highest 134% of resource utilization was observed for mod-inventory.
Average for mod-inventory-b - 132%, mod-inventory-storage-b - 36%, mod-source-record-storage-b - 36%, mod-source-record-manager-b - 22%, mod-di-converter-storage-b - 68%, , mod-data-import - 254% spike for 100k job and mod-data-import - 85% spike for 25k job.
Non-ECS tests results*: Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.
Member tenant
During create jobs the highest cpu utilization was with 100k record file by mod-inventory - 117%. During update jobs mod-inventory module utilized 160% with 100k record file. The spikes were observed in mod-data-import at the very beginning of each job that was expected.
Create jobs:
The highest 112% of resource utilization was observed for mod-inventory. And at the end of the test we see that mod-quick-marc-b(122%) module begin to consume more than than other modules. Such behaviour was observed for all create jobs.
Average for mod-inventory-b - 112%, mod-inventory-storage-b - 36%, mod-source-record-storage-b - 48%, mod-source-record-manager-b - 29%, mod-di-converter-storage-b - 56%, , mod-data-import - 179% spike for 100k job and mod-data-import - 70% spike for 25k job.
on-ECS tests results*: Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job.
Update jobs:
The highest 158% of resource utilization was observed for mod-inventory.
Average for mod-inventory-b - 158%, mod-inventory-storage-b - 32%, mod-source-record-storage-b - 33%, mod-source-record-manager-b - 20%, mod-di-converter-storage-b - 52%, , mod-data-import - 161% spike for 100k job and mod-data-import - 73% spike for 25k job.
Non-ECS tests results*: Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.
Memory Utilization
Memory consumption increased for mod-source-record-storage (20%) and mod-source-record-manager (10%) in both central and member tenants if to compare with non-ECS tests results. It decreased for data-import module (15%).
This table provides Average Memory Consumption in ECS and Non-ECS test results. Mod-di-converter-storage-b in the Non-ECS tests scenario is not provided, as indicated by the "-".
Module | ECS | Non-ECS | |||
Central tenant | Member tenant | ||||
Create Jobs | Update Jobs | Create Jobs | Update Jobs | ||
mod-inventory-b | 96% | 96% | 101% | 101% | 90% |
mod-inventory-storage-b | 18% | 21% | 20% | 23% | 18% |
mod-source-record-storage-b | 64% | 74% | 71% | 71% | 46% |
mod-source-record-manager-b | 49% | 49% | 46% | 43% | 38% |
mod-di-converter-storage-b | 32% | 33% | 33% | 33% | - |
mod-data-import | 38% | 38% | 35% | 37% | 53% |
Central tenant
Memory consumption for module mod-inventory grew gradually to 96 % during create jobs and didn't change during update jobs. No memory leaks were detected.
Create jobs:
Average for mod-inventory-b - 96%, mod-inventory-storage-b - 18%, mod-source-record-storage-b - 64%, mod-source-record-manager-b - 49%, mod-di-converter-storage-b - 32%, , mod-data-import - 38%.
Update jobs:
Average for mod-inventory-b - 96%, mod-inventory-storage-b - 21%, mod-source-record-storage-b - 74%, mod-source-record-manager-b - 49%, mod-di-converter-storage-b - 33%, , mod-data-import - 38%.
Non-ECS tests results*:
Average for mod-inventory-b - 90%, mod-inventory-storage-b - 18%, mod-source-record-storage-b - 46%, mod-source-record-manager-b - 38%, mod-di-converter-storage-b - %, , mod-data-import - 53%.
Member tenant
Memory consumption for module mod-inventory grew gradually to 101% during create jobs and didn't change during update jobs. No memory leaks were detected.
Create jobs:
Average for mod-inventory-b -101 %, mod-inventory-storage-b - 20%, mod-source-record-storage-b - 71%, mod-source-record-manager-b - 46%, mod-di-converter-storage-b - 33%, , mod-data-import - 35%.
Update jobs:
Average for mod-inventory-b - 101%, mod-inventory-storage-b - 23%, mod-source-record-storage-b - 71%, mod-source-record-manager-b - 43%, mod-di-converter-storage-b - 33%, , mod-data-import - 37%.
Non-ECS tests results*:
Average for mod-inventory-b - 90%, mod-inventory-storage-b - 18%, mod-source-record-storage-b - 46%, mod-source-record-manager-b - 38%, mod-di-converter-storage-b - %, , mod-data-import - 53%.
DB CPU Utilization
Central tenant
RDS utilized 98% with DI MARC Bib Create jobs and 94% with DI MARC Bib Update jobs
Member tenant
RDS utilized 95% with DI MARC Bib Create jobs and 90% with DI MARC Bib Update jobs
DB Connections
Central tenant
In central tenant DB connections did not exceed 312 for DI MARC Bib Create jobs and 290 for DI MARC Bib Update jobs.
Member tenant
DB connections did not exceed 312 for DI MARC Bib Create jobs and 300 for DI MARC Bib Update jobs.
DB Load
Central tenant
Waits
SQL
Member tenant
Waits
SQL
Top SQL
Central tenant
Member tenant
OpenSearch Service
CPU utilization (Percent)
Central tenant
Member tenant
Maximum memory utilization (Percent
Central tenant
Member tenant
Indexing Data Rate (operations/min)
Central tenant
Member tenant
Appendix
Methodology/Approach
DI tests were started from UI with 5 min pauses between tests.
Infrastructure
pcon
- mod_source_record_storage.marc_records_lb = 2171059
- mod_source_record_storage.raw_records_lb = 2171059
- mod_source_record_storage.records_lb = 2171059
- mod_source_record_storage.marc_indexers = 160397101
- mod_source_record_storage.marc_indexers with field_no 010 = 1102025
- mod_source_record_storage.marc_indexers with field_no 035 = 6621158
- mod_inventory_storage.authority = 0
- mod_inventory_storage.holdings_record = 392000
- mod_inventory_storage.instance = 1604898
- mod_inventory_storage.item = 392000
pcp1
- mod_source_record_storage.marc_records_lb = 28000638
- mod_source_record_storage.raw_records_lb = 28032783
- mod_source_record_storage.records_lb = 28032783
- mod_source_record_storage.marc_indexers = 611119607
- mod_source_record_storage.marc_indexers with field_no 010 = 1119740
- mod_source_record_storage.marc_indexers with field_no 035 = 22621766
- mod_inventory_storage.authority = 7402975
- mod_inventory_storage.holdings_record = 24165674
- mod_inventory_storage.instance = 24206317
- mod_inventory_storage.item = 25375853
Environment: PCON
Release: Poppy (2023 R2)
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)
- 2 instances of db.r6.xlarge database instances, one reader, and one writer
- MSK tenant
- 4 brokers
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- og.retention.minutes=480
- default.replication.factor=3
Modules
Module | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
pcon-pvt | ||||||||||
Wed Dec 20 09:08:19 UTC 2023 | ||||||||||
mod-search | 2 | mod-search:3.0.3 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | FALSE |
mod-data-import | 1 | mod-data-import:3.0.0 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | FALSE |
mod-authtoken | 1 | mod-authtoken:2.14.0 | 2 | 1440 | 1152 | 512 | 922 | 88 | 128 | FALSE |
mod-inventory-storage | 1 | mod-inventory-storage:27.0.0 | 2 | 4096 | 3690 | 2048 | 3076 | 384 | 512 | FALSE |
mod-source-record-storage | 1 | mod-source-record-storage:5.7.0 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE |
mod-inventory | 1 | mod-inventory:20.1.0 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | FALSE |
mod-di-converter-storage | 1 | mod-di-converter-storage:2.1.0 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
mod-users | 2 | mod-users:19.2.0 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
mod-source-record-manager | 1 | mod-source-record-manager:3.7.0 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE |
mod-quick-marc | 1 | mod-quick-marc:5.0.0 | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | FALSE |
okapi-b | 2 | okapi:5.1.1 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | FALSE |