Data Import test report (Poppy)

Data Import test report (Poppy)



Overview

This document contains the results of testing Data Import for MARC Bibliographic records at Poppy release. https://folio-org.atlassian.net/browse/PERF-712 

Summary

  • Duration for DI increases correlates to the number of the records imported. 

  • The increase in memory utilization was due to the scheduled cluster shutdown. No memory leak is suspected for DI modules.

  • Average CPU utilization of modules for all Create and Update jobs did not exceed 150 %. Spikes at the beginning because of mod-data-import module are expected.

  • Approximate DB CPU usage is close to 95% and this numbers goes for all jobs with files more than 10k records. 

  • Poppy release has higher average CPU resource utilization of DI related services comparing with Orchid. Specially mod-inventory. 

  • Errors occurred for DI create job with 100k file and for DI update jobs with 25k file because of timeout issue (Opening SQLConnection failed: Timeout)

  • There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Recommendations and Jiras

  1. Investigate Timeout issues. Ticket created https://folio-org.atlassian.net/browse/MODINV-924

  2. Check memory trends for mod-source-record-storage-b and mod-inventory during additional DI tests without cluster night shut down

  3. Increase CPU units allocation for mod-inventory, mod-di-converter-storage, mod-quick-marc services

  4. Use higher DB instance type (scale up from db.r6g.xlarge to db.r6g.2xlarge). 

Results

Test #



Duration

Orchid with R/W split enabled (07/09/2023)

Duration

Poppy 

Difference, % / sec

Results

Test #



Duration

Orchid with R/W split enabled (07/09/2023)

Duration

Poppy 

Difference, % / sec

Results

1

1k MARC BIB Create

PTF - Create 2



39 sec



Completed

2

2k MARC BIB Create

PTF - Create 2



1 min 01 sec



Completed

3

5k MARC BIB Create

PTF - Create 2

2 min 23 sec

2 min 22 sec

0.88% / 1 sec

Completed

4

10k MARC BIB Create

PTF - Create 2

5 min 12 sec

4 min 29 sec

18.86% / 43 sec

Completed

5

25k MARC BIB Create

PTF - Create 2

11 min 45 sec

10 min 38 sec

11.38% / 67 sec

Completed 

6

50k MARC BIB Create

PTF - Create 2

23 min 36 sec

20 min 26 sec

15.18% / 190 sec

Completed 

7

100k MARC BIB Create

PTF - Create 2

49 min 28 sec

2 hours 46 min



Cancelled (stopped by user) *

8

1k MARC BIB Update

PTF - Updates Success - 1



34 sec



Completed

9

2k MARC BIB Update

PTF - Updates Success - 1



1 min 09 sec



Completed

10

5k MARC BIB Update

PTF - Updates Success - 1

2 min 48 sec

2 min 31 sec

6.66% / 17 sec

Completed

11

10k MARC BIB Update

PTF - Updates Success - 1

5 min 23 sec

5 min 13 sec

1.84% / 10 sec

Completed

12

25k MARC BIB Update

PTF - Updates Success - 1

14 min 12 sec

12 min 27 sec

14% / 105 sec

Completed with errors *

13

25k MARC BIB Update

PTF - Updates Success - 1



2 min 15 sec



Completed with errors *

14

25k MARC BIB Update

PTF - Updates Success - 1



12 min



Cancelled (stopped by user) *

 * - for all jobs completed with errors or cancelled there was the same issue in UI: io.vertx.core.impl.NoStackTraceThrowable: Timeout

Test #14 was stopped manually from UI. 2 tests with 25k MARC BIB Update were carried out to confirm that 25k doesn't work properly and has the same issue. 



Memory Utilization

The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.

MARC BIB CREATE

Tests #1-7

1k, 2k, 5k, 10k, 25k, 50k, 100k records

MARC BIB UPDATE

Tests #8-14

1k, 2k, 5k, 10k, 25k, 25k, 25k records

Service CPU Utilization 

MARC BIB CREATE

Tests #1-7

1k, 2k, 5k, 10k, 25k, 50k, 100k records

CPU utilization for all modules came back to by default numbers after all tests. The highest 170% of resource utilization was observed for mod-quick-marc-b module in 5k DI create job.

Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job.



MARC BIB UPDATE

Tests #8-14

1k, 2k, 5k, 10k, 25k, 25k, 25k records

Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.

RDS CPU Utilization 

MARC BIB CREATE

Average 95% for DI jobs with more than 10k records

MARC BIB UPDATE

RDS Database Connections

MARC BIB CREATE
 For DI  job Create maximum 275 and for Update - 260 connections

Average active sessions (AAS)

MARC BIB CREATE

Top SQL

MARC BIB UPDATE

Top SQL

INSERT INTO fs09000000_mod_source_record_manager.events_processed 

INSERT INTO fs09000000_mod_source_record_manager.journal_records 

MSK CPU utilization (Percent) OpenSearch

Utilization is not higher than 20%

CPU (User) usage by broker

Errors

15:58:44 [478677/data-import] [fs09000000] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [] ERROR PostgresClient       Opening SQLConnection failed: Timeout



15:24:19 [] [fs09000000] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 3 subscriptionPattern: SubscriptionDefinition(eventType=DI_COMPLETED, subscriptionPattern=pcp1\.Default\.\w{1,}\.DI_COMPLETED) offset: 541469



io.vertx.pgclient.PgException: ERROR: Cannot update record 511ea771-3dd0-49a7-a14d-c4187c94aff7 because it has been changed (optimistic locking): Stored _version is 2, _version of request is 1 (23F09)



Additional tests were analyzed to investigate "(optimistic locking)" issue

Issue found only in create jobs in mod-inventory-storage and mod-inventory modules - it happens in completed and failed jobs as well and also depends on files.



The table shows the amount of optimistic locking messages in tests. Hyphen '-' means no test performed.

Date\File

1k

2k

5k

10k

25k

50k

100k (failed)

250k

2023.10.26

(additional with other files)

9

-

-

39

148

137

3393

-

2023.10.27 Testing

1

No

No

6

9

7

4

-

2023.11.01

(additional with other files)

No

-

-

-

-

-

-

10



Test File Splitting Feature with multiple async workers

https://folio-org.atlassian.net/browse/PERF-705

mod-data-import has been developed to be able to dispatch multiple jobs at the same time by configuring the variable ASYNC_PROCESSOR_MAX_WORKERS_COUNT.

There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Single tenant test results

DI duration 100k

ASYNC_PROCESSOR_MAX_WORKERS_COUNT



1

5

tenant fs09000000

01:02:12 (49:00)

00:48:07

all 3 tenants concurrently

02:22:25

02:25:51

Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 1

3 tenants concurrently

1 - ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Duration

Start

End

fs09000000

01:33:59

2024-04-03 10:20:29.877+00

2024-04-03 11:54:28.915+00

fs07000001

02:09:23

2024-04-03 10:21:50.416+00

2024-04-03 12:31:13.296+00

fs07000002

02:18:47

2024-04-03 10:23:58.926+00

2024-04-03 12:42:45.894+00

Test with ASYNC_PROCESSOR_MAX_WORKERS_COUNT = 5 

3 tenants concurrently

5 - ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Duration

Start

End

fs09000000

01:52:11

2024-04-03 14:16:23.22+00

2024-04-03 16:08:33.991+00

fs07000001

02:06:36

2024-04-03 14:17:28.516+00

2024-04-03 16:24:04.194+00

fs07000002

02:23:29

2024-04-03 14:18:45.622+00

2024-04-03 16:42:14.323+00

Resource Utilization



CPU Usage increase trend observed for mod-inventory. A probable reason for growth is https://folio-org.atlassian.net/browse/MODINV-944



Appendix

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • 2 database  instances, writer/reader

  • MSK tenant

    • 4 m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

Module

Task Def. Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize

MaxMetaspaceSize

R/W split enabled

pcp1-pvt



mod-inventory-storage-b

10

mod-inventory-storage:27.0.0

2

4096

3690

2048

3076

384

512

FALSE

mod-data-import-b

11

mod-data-import:3.0.1

1

2048

1844

256

1292

384

512