Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

  • Duration for DI increases correlates to the number of the records imported. 
  • The increase in memory utilization was due to the scheduled cluster shutdown. No memory leak is suspected for DI modules.
  • Average CPU utilization of modules for all Create and Update jobs did not exceed 150 %. Spikes at the beginning because of mod-data-import module are expected.
  • Approximate DB CPU usage is close to 95% and this numbers goes for all jobs with files more than 10k records. 
  • Poppy release has higher average CPU resource utilization of DI related services comparing with Orchid. Specially mod-inventory. 
  • Errors occurred for DI create job with 100k file and for DI update jobs with 25k file because of timeout issue (Opening SQLConnection failed: Timeout)
  • There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Recommendations and Jiras

  1. Investigate Timeout issues. Ticket created
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODINV-924
  2. Check memory trends for mod-source-record-storage-b and mod-inventory during additional DI tests without cluster night shut down
  3. Increase CPU units allocation for mod-inventory, mod-di-converter-storage, mod-quick-marc services
  4. Use higher DB instance type (scale up from db.r6g.xlarge to db.r6g.2xlarge). 

Results

Test #

Profile

Duration

Orchid with R/W split enabled (07/09/2023)

Duration

Poppy  with R/W split enabled

Difference, % / secResults
1

1k MARC BIB Create

PTF - Create 2
39 sec
Completed
2

2k MARC BIB Create

PTF - Create 2
1 min 01 sec
Completed
35k MARC BIB CreatePTF - Create 22 min 23 sec2 min 22 sec0.88% / 1 secCompleted
410k MARC BIB CreatePTF - Create 25 min 12 sec4 min 29 sec18.86% / 43 secCompleted
525k MARC BIB CreatePTF - Create 211 min 45 sec10 min 38 sec11.38% / 67 secCompleted 
650k MARC BIB CreatePTF - Create 223 min 36 sec20 min 26 sec15.18% / 190 secCompleted 

7

100k MARC BIB CreatePTF - Create 249 min 28 sec2 hours 46 min
Cancelled (stopped by user) *
81k MARC BIB UpdatePTF - Updates Success - 1
34 sec
Completed
92k MARC BIB UpdatePTF - Updates Success - 1
1 min 09 sec
Completed
105k MARC BIB UpdatePTF - Updates Success - 12 min 48 sec2 min 31 sec6.66% / 17 secCompleted
1110k MARC BIB UpdatePTF - Updates Success - 15 min 23 sec5 min 13 sec1.84% / 10 secCompleted
1225k MARC BIB UpdatePTF - Updates Success - 114 min 12 sec12 min 27 sec14% / 105 secCompleted with errors *
1325k MARC BIB UpdatePTF - Updates Success - 1
2 min 15 sec
Completed with errors *
1425k MARC BIB UpdatePTF - Updates Success - 1
12 min
Cancelled (stopped by user) *

...

Average active sessions (AAS)

MARC BIB CREATE

Top SQL

MARC BIB UPDATE

Top SQL

INSERT INTO fs09000000_mod_source_record_manager.events_processed 

INSERT INTO fs09000000_mod_source_record_manager.journal_records 

MSK CPU utilization (Percent) OpenSearch

...

io.vertx.pgclient.PgException: ERROR: Cannot update record 511ea771-3dd0-49a7-a14d-c4187c94aff7 because it has been changed (optimistic locking): Stored _version is 2, _version of request is 1 (23F09)


Additional tests were analysed analyzed to investigate "(optimistic locking)" issue

Issue found only in create jobs in mod-inventory-storage and mod-inventory modules - it happens in completed and failed jobs as well and also depends on files.


Table The table shows the amount of optimistic locking messages in tests. Hyphen '-' means no test performed.

Date\File1k2k5k10k25k50k100k (failed)250k

2023.10.26

(additional with other files)

9--391481373393-
2023.10.27 Testing1NoNo6974-

2023.11.01

(additional with other files)

No------10


Test File Splitting Feature with multiple async workers

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-705

mod-data-import has been developed to be able to dispatch multiple jobs at the same time by configuring the variable ASYNC_PROCESSOR_MAX_WORKERS_COUNT.

There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Single tenant test results

DI duration 100kASYNC_PROCESSOR_MAX_WORKERS_COUNT

15
tenant fs0900000001:02:12 (49:00)00:48:07
all 3 tenants concurrently02:22:2502:25:51

Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 1

3 tenants concurrently

1 - ASYNC_PROCESSOR_MAX_WORKERS_COUNTDurationStartEnd
fs0900000001:33:592024-04-03 10:20:29.877+002024-04-03 11:54:28.915+00
fs0700000102:09:232024-04-03 10:21:50.416+002024-04-03 12:31:13.296+00
fs0700000202:18:472024-04-03 10:23:58.926+002024-04-03 12:42:45.894+00


Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 5 

3 tenants concurrently

5 - ASYNC_PROCESSOR_MAX_WORKERS_COUNTDurationStartEnd
fs0900000001:52:112024-04-03 14:16:23.22+002024-04-03 16:08:33.991+00
fs0700000102:06:362024-04-03 14:17:28.516+002024-04-03 16:24:04.194+00
fs0700000202:23:292024-04-03 14:18:45.622+002024-04-03 16:42:14.323+00

Resource Utilization

Image Added


CPU Usage increase trend observed for mod-inventory. A probable reason for growth is 

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODINV-944


Image Added

Image Added

Appendix

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731


  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

...