Table of Contents |
---|
...
- Duration for DI increases correlates to the number of the records imported.
- The increase in memory utilization was due to the scheduled cluster shutdown. No memory leak is suspected for DI modules.
- Average CPU utilization of modules for all Create and Update jobs did not exceed 150 %. Spikes at the beginning because of mod-data-import module are expected.
- Approximate DB CPU usage is close to 95% and this numbers goes for all jobs with files more than 10k records.
- Poppy release has higher average CPU resource utilization of DI related services comparing with Orchid. Specially mod-inventory.
- Errors occurred for DI create job with 100k file and for DI update jobs with 25k file because of timeout issue (
Opening SQLConnection failed: Timeout)
There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT
Recommendations and Jiras
- Investigate Timeout issues. Ticket created
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODINV-924 - Check memory trends for mod-source-record-storage-b and mod-inventory during additional DI tests without cluster night shut down
- Increase CPU units allocation for mod-inventory, mod-di-converter-storage, mod-quick-marc services
- Use higher DB instance type (scale up from db.r6g.xlarge to db.r6g.2xlarge).
Results
Test # | Duration Orchid with R/W split enabled (07/09/2023) | Duration Poppy with R/W split enabled | Difference, % / sec | Results | ||
---|---|---|---|---|---|---|
1 | 1k MARC BIB Create | PTF - Create 2 | 39 sec | Completed | ||
2 | 2k MARC BIB Create | PTF - Create 2 | 1 min 01 sec | Completed | ||
3 | 5k MARC BIB Create | PTF - Create 2 | 2 min 23 sec | 2 min 22 sec | ↓ 0.88% / 1 sec | Completed |
4 | 10k MARC BIB Create | PTF - Create 2 | 5 min 12 sec | 4 min 29 sec | ↓ 18.86% / 43 sec | Completed |
5 | 25k MARC BIB Create | PTF - Create 2 | 11 min 45 sec | 10 min 38 sec | ↓ 11.38% / 67 sec | Completed |
6 | 50k MARC BIB Create | PTF - Create 2 | 23 min 36 sec | 20 min 26 sec | ↓ 15.18% / 190 sec | Completed |
7 | 100k MARC BIB Create | PTF - Create 2 | 49 min 28 sec | 2 hours 46 min | Cancelled (stopped by user) * | |
8 | 1k MARC BIB Update | PTF - Updates Success - 1 | 34 sec | Completed | ||
9 | 2k MARC BIB Update | PTF - Updates Success - 1 | 1 min 09 sec | Completed | ||
10 | 5k MARC BIB Update | PTF - Updates Success - 1 | 2 min 48 sec | 2 min 31 sec | ↓ 6.66% / 17 sec | Completed |
11 | 10k MARC BIB Update | PTF - Updates Success - 1 | 5 min 23 sec | 5 min 13 sec | ↓ 1.84% / 10 sec | Completed |
12 | 25k MARC BIB Update | PTF - Updates Success - 1 | 14 min 12 sec | 12 min 27 sec | ↓ 14% / 105 sec | Completed with errors * |
13 | 25k MARC BIB Update | PTF - Updates Success - 1 | 2 min 15 sec | Completed with errors * | ||
14 | 25k MARC BIB Update | PTF - Updates Success - 1 | 12 min | Cancelled (stopped by user) * |
...
Average active sessions (AAS)
MARC BIB CREATE
Top SQL
MARC BIB UPDATE
Top SQL
INSERT INTO fs09000000_mod_source_record_manager.events_processed
INSERT INTO fs09000000_mod_source_record_manager.journal_records
MSK CPU utilization (Percent) OpenSearch
...
|
Additional tests were analysed analyzed to investigate "(optimistic locking)" issue
Issue found only in create jobs in mod-inventory-storage and mod-inventory modules - it happens in completed and failed jobs as well and also depends on files.
Table The table shows the amount of optimistic locking
messages in tests. Hyphen '-' means no test performed.
Date\File | 1k | 2k | 5k | 10k | 25k | 50k | 100k (failed) | 250k |
2023.10.26 (additional with other files) | 9 | - | - | 39 | 148 | 137 | 3393 | - |
2023.10.27 Testing | 1 | No | No | 6 | 9 | 7 | 4 | - |
2023.11.01 (additional with other files) | No | - | - | - | - | - | - | 10 |
Test File Splitting Feature with multiple async workers
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
mod-data-import has been developed to be able to dispatch multiple jobs at the same time by configuring the variable ASYNC_PROCESSOR_MAX_WORKERS_COUNT.
There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT
Single tenant test results
DI duration 100k | ASYNC_PROCESSOR_MAX_WORKERS_COUNT | |
1 | 5 | |
tenant fs09000000 | 01:02:12 (49:00) | 00:48:07 |
all 3 tenants concurrently | 02:22:25 | 02:25:51 |
Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 1
3 tenants concurrently
1 - ASYNC_PROCESSOR_MAX_WORKERS_COUNT | Duration | Start | End |
fs09000000 | 01:33:59 | 2024-04-03 10:20:29.877+00 | 2024-04-03 11:54:28.915+00 |
fs07000001 | 02:09:23 | 2024-04-03 10:21:50.416+00 | 2024-04-03 12:31:13.296+00 |
fs07000002 | 02:18:47 | 2024-04-03 10:23:58.926+00 | 2024-04-03 12:42:45.894+00 |
Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 5
3 tenants concurrently
5 - ASYNC_PROCESSOR_MAX_WORKERS_COUNT | Duration | Start | End |
fs09000000 | 01:52:11 | 2024-04-03 14:16:23.22+00 | 2024-04-03 16:08:33.991+00 |
fs07000001 | 02:06:36 | 2024-04-03 14:17:28.516+00 | 2024-04-03 16:24:04.194+00 |
fs07000002 | 02:23:29 | 2024-04-03 14:18:45.622+00 | 2024-04-03 16:42:14.323+00 |
Resource Utilization
CPU Usage increase trend observed for mod-inventory. A probable reason for growth is
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Appendix
Infrastructure
PTF -environment pcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, writer/reader
Name Memory GIB vCPUs max_connections db.r6g.xlarge
32 GiB 4 vCPUs 2731 - MSK tenant
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...