Data Import test report (Poppy)


Overview

This document contains the results of testing Data Import for MARC Bibliographic records at Poppy release. PERF-712 - Getting issue details... STATUS  

Summary

  • Duration for DI increases correlates to the number of the records imported. 
  • The increase in memory utilization was due to the scheduled cluster shutdown. No memory leak is suspected for DI modules.
  • Average CPU utilization of modules for all Create and Update jobs did not exceed 150 %. Spikes at the beginning because of mod-data-import module are expected.
  • Approximate DB CPU usage is close to 95% and this numbers goes for all jobs with files more than 10k records. 
  • Poppy release has higher average CPU resource utilization of DI related services comparing with Orchid. Specially mod-inventory. 
  • Errors occurred for DI create job with 100k file and for DI update jobs with 25k file because of timeout issue (Opening SQLConnection failed: Timeout)
  • There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Recommendations and Jiras

  1. Investigate Timeout issues. Ticket created MODINV-924 - Getting issue details... STATUS
  2. Check memory trends for mod-source-record-storage-b and mod-inventory during additional DI tests without cluster night shut down
  3. Increase CPU units allocation for mod-inventory, mod-di-converter-storage, mod-quick-marc services
  4. Use higher DB instance type (scale up from db.r6g.xlarge to db.r6g.2xlarge). 

Results

Test #

Profile

Duration

Orchid with R/W split enabled (07/09/2023)

Duration

Poppy 

Difference, % / secResults
1

1k MARC BIB Create

PTF - Create 2
39 sec
Completed
2

2k MARC BIB Create

PTF - Create 2
1 min 01 sec
Completed
35k MARC BIB CreatePTF - Create 22 min 23 sec2 min 22 sec0.88% / 1 secCompleted
410k MARC BIB CreatePTF - Create 25 min 12 sec4 min 29 sec18.86% / 43 secCompleted
525k MARC BIB CreatePTF - Create 211 min 45 sec10 min 38 sec11.38% / 67 secCompleted 
650k MARC BIB CreatePTF - Create 223 min 36 sec20 min 26 sec15.18% / 190 secCompleted 

7

100k MARC BIB CreatePTF - Create 249 min 28 sec2 hours 46 min
Cancelled (stopped by user) *
81k MARC BIB UpdatePTF - Updates Success - 1
34 sec
Completed
92k MARC BIB UpdatePTF - Updates Success - 1
1 min 09 sec
Completed
105k MARC BIB UpdatePTF - Updates Success - 12 min 48 sec2 min 31 sec6.66% / 17 secCompleted
1110k MARC BIB UpdatePTF - Updates Success - 15 min 23 sec5 min 13 sec1.84% / 10 secCompleted
1225k MARC BIB UpdatePTF - Updates Success - 114 min 12 sec12 min 27 sec14% / 105 secCompleted with errors *
1325k MARC BIB UpdatePTF - Updates Success - 1
2 min 15 sec
Completed with errors *
1425k MARC BIB UpdatePTF - Updates Success - 1
12 min
Cancelled (stopped by user) *

 * - for all jobs completed with errors or cancelled there was the same issue in UI: io.vertx.core.impl.NoStackTraceThrowable: Timeout

Test #14 was stopped manually from UI. 2 tests with 25k MARC BIB Update were carried out to confirm that 25k doesn't work properly and has the same issue. 


Memory Utilization

The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.

MARC BIB CREATE

Tests #1-7

1k, 2k, 5k, 10k, 25k, 50k, 100k records

MARC BIB UPDATE

Tests #8-14

1k, 2k, 5k, 10k, 25k, 25k, 25k records

Service CPU Utilization 

MARC BIB CREATE

Tests #1-7

1k, 2k, 5k, 10k, 25k, 50k, 100k records

CPU utilization for all modules came back to by default numbers after all tests. The highest 170% of resource utilization was observed for mod-quick-marc-b module in 5k DI create job.

Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job.


MARC BIB UPDATE

Tests #8-14

1k, 2k, 5k, 10k, 25k, 25k, 25k records

Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.

RDS CPU Utilization 

MARC BIB CREATE

Average 95% for DI jobs with more than 10k records

MARC BIB UPDATE

RDS Database Connections

MARC BIB CREATE
 For DI  job Create maximum 275 and for Update - 260 connections

Average active sessions (AAS)

MARC BIB CREATE

Top SQL

MARC BIB UPDATE

Top SQL

INSERT INTO fs09000000_mod_source_record_manager.events_processed 

INSERT INTO fs09000000_mod_source_record_manager.journal_records 

MSK CPU utilization (Percent) OpenSearch

Utilization is not higher than 20%

CPU (User) usage by broker

Errors

15:58:44 [478677/data-import] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [] ERROR PostgresClient       Opening SQLConnection failed: Timeout


15:24:19 [] [fs09000000] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 3 subscriptionPattern: SubscriptionDefinition(eventType=DI_COMPLETED, subscriptionPattern=pcp1\.Default\.\w{1,}\.DI_COMPLETED) offset: 541469


io.vertx.pgclient.PgException: ERROR: Cannot update record 511ea771-3dd0-49a7-a14d-c4187c94aff7 because it has been changed (optimistic locking): Stored _version is 2, _version of request is 1 (23F09)


Additional tests were analyzed to investigate "(optimistic locking)" issue

Issue found only in create jobs in mod-inventory-storage and mod-inventory modules - it happens in completed and failed jobs as well and also depends on files.


The table shows the amount of optimistic locking messages in tests. Hyphen '-' means no test performed.

Date\File1k2k5k10k25k50k100k (failed)250k

2023.10.26

(additional with other files)

9--391481373393-
2023.10.27 Testing1NoNo6974-

2023.11.01

(additional with other files)

No------10


Test File Splitting Feature with multiple async workers

PERF-705 - Getting issue details... STATUS

mod-data-import has been developed to be able to dispatch multiple jobs at the same time by configuring the variable ASYNC_PROCESSOR_MAX_WORKERS_COUNT.

There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Single tenant test results

DI duration 100kASYNC_PROCESSOR_MAX_WORKERS_COUNT

15
tenant fs0900000001:02:12 (49:00)00:48:07
all 3 tenants concurrently02:22:2502:25:51

Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 1

3 tenants concurrently

1 - ASYNC_PROCESSOR_MAX_WORKERS_COUNTDurationStartEnd
fs0900000001:33:592024-04-03 10:20:29.877+002024-04-03 11:54:28.915+00
fs0700000102:09:232024-04-03 10:21:50.416+002024-04-03 12:31:13.296+00
fs0700000202:18:472024-04-03 10:23:58.926+002024-04-03 12:42:45.894+00

Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 5 

3 tenants concurrently

5 - ASYNC_PROCESSOR_MAX_WORKERS_COUNTDurationStartEnd
fs0900000001:52:112024-04-03 14:16:23.22+002024-04-03 16:08:33.991+00
fs0700000102:06:362024-04-03 14:17:28.516+002024-04-03 16:24:04.194+00
fs0700000202:23:292024-04-03 14:18:45.622+002024-04-03 16:42:14.323+00

Resource Utilization


CPU Usage increase trend observed for mod-inventory. A probable reason for growth is  MODINV-944 - Getting issue details... STATUS


Appendix

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
pcp1-pvt

mod-inventory-storage-b10mod-inventory-storage:27.0.024096369020483076384512FALSE
mod-data-import-b11mod-data-import:3.0.11204818442561292384512FALSE
mod-source-record-storage-b10mod-source-record-storage:5.7.025600500020483500384512FALSE
mod-inventory-b9mod-inventory:20.1.022880259210241814384512FALSE
mod-source-record-manager-b9mod-source-record-manager:3.7.025600500020483500384512FALSE
mod-di-converter-storage-b13mod-di-converter-storage:2.1.02102489612876888128FALSE

Methodology

  1. Prepare files for DI Create job

    • 1K, 2K, 5K, 10K, 25K, 50K, 100K files.
  2. Run DI Create on a single tenant one by one with delay with files using PTF - Create 2 profile.
  3. Prepare files for DI Update with Data export app
  4. Run DI Update on a single tenant one by one with delay with prepared files using PTF - Update Success 1 profile