[Quesnelia] [ECS] [Data import] Create MARC authority Records

Overview

In the scope of https://folio-org.atlassian.net/browse/PERF-854 [Quesnelia] [ECS] [Data import] Create MARC authority Records
it's needed to run tests to answer questions: 

  • Determine the time it takes to complete Data-import on QCON env

  • Determine the main modules that are involved in the process (if obvious or if known)

  • Test specific settings or items or scenarios:
    - Check-in and Checkout (CICO) is in progress and there are 5 concurrent users,
    - Run DI for 50K-25K-10K-5k-1k on 3 tenants in parallel in the same cluster.

Summary

  • Test 1. Data import of 1K, 5K, 10K, 25K and 50K and Test 2 Data import of 1K, 5K, 10K, 25K and 50K + 5 Virtual users CICO finished successfully without errors or issues. Test 3. MultitenantDI failed to finis jobs in 12 hours for 50K records on 2nd and 3rd tenants.

  • General summary. Data Import MARC Authorities work on Quesnelia ECS like on Quesnelia non-ECS and Poppy. DI duration is about the same and resource utilization has the same behavior as in the previous releases except mod-entitylink service that consume 80% of CPU in comparison to 35% in Poppy.

  • Data-import of 1K, 5K, 10K, 25K and 50K records during test 1 and test 2(without CICO and with CICO) have about the same duration. So no significant impact of CICO(5VU) on DI proccess.

  • On Poppry release environment was stable to process 5 DI jobs in parallel on 3 tenants, on Quesnelia - ECS duration of completed jobs were 4-7 time longer. 50K DI failed to finished on 2nd and 3rd tenants, it was stopped manually after 12 hours.

Recommendations & Jiras (Optional)

Retest after this issue will be fixed https://folio-org.atlassian.net/browse/MODDATAIMP-1064

Test Runs

Important! Before each test, data-import MRC files were prepared using python script ensure creation of MARC Authority records.

Test 1. DI Central tenant 1k-5K-10K-25K-50K. Data-import of MARC authority Records on the central tenant in order 1k-5k-10k-25k-50k using the created job profile KG - Create SRS MARC Authority on nonmatches to 010 $a DUPLICATE for Q

Test 2. DI Central tenant 1k-5K-10K-25K-50K + CI/CO 5VU. Run CICO for 5 concurrent users, duration about 1 hour, rump period 50 seconds. Data-import 1K, 5K, 10K, 25K, and 50K started on the central tenant in parallel with the same profile.

Test 3. Multitenant DI 50K-25K-10K-5K-1K. Run Data-import of MARC authority Records in parallel on the central, second, and third tenants in order from 50k, 25k, 10k, 5k, 1k

using job profile KG - Create SRS MARC Authority on nonmatches to 010 $a DUPLICATE for Q

Test Results and Comparison

Reports for results comparison:

Data Import MARC Authorities [Poppy] - https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/1406578

Data Import MARC Authorities [Quesnelia NON-ECS] -

Test 1. DI Central tenant 1k-5K-10K-25K-50K.

Test with 1k, 10k, 25k and 50k records files DI started on one tenant only(qcp1-00), and comparative results between Poppy and Quesnelia.

# of records 

% creates

DI duration 
Morning Glory

DI duration
Nolana

DI duration 
Orchid

DI duration 
Poppy

DI duration 
Quesnelia
[non-ECS], QCP1

DI duration 
Quesnelia
[ECS], QCON

# of records 

% creates

DI duration 
Morning Glory

DI duration
Nolana

DI duration 
Orchid

DI duration 
Poppy

DI duration 
Quesnelia
[non-ECS], QCP1

DI duration 
Quesnelia
[ECS], QCON

1,000

100

24 s

27 s

41 sec

29 sec

22 sec

25 sec

5,000

100

1 min 21 s

1 min 15 s

1min 21s

1 min 38 sec

1 min 19 sec

1 min 23 sec

10,000

100

2 min 32 s

2 min 31 s

2min 53s

2 min 53 sec

2 min 36 sec

2 min 43 sec

25000

100

11 min 14 s

7 min 7 s

5 min 42s

6 min 24 sec

6 min 19 sec

6 min 27 sec

50,000

100

22 min

11 min 24 s

11 min 11s

13 min 48 sec

11 min 59 sec

11 min 45s

Test 2. DI Central tenant 1k-5K-10K-22K-50K + CI/CO 5VU.

Number of records

DI Duration

with CICO

Poppy

DI Duration with CICO
Quesnelia
non-ECS

DI Duration with CICO
Quesnelia
ECS

CI Avg time
(Quesnelia)

CI, Avg time without DI
Quesnelia
ECS

CO time Avg
(Quesnelia)

CO, Avgtime without
DI
Quesnelia
ECS

Number of records

DI Duration

with CICO

Poppy

DI Duration with CICO
Quesnelia
non-ECS

DI Duration with CICO
Quesnelia
ECS

CI Avg time
(Quesnelia)

CI, Avg time without DI
Quesnelia
ECS

CO time Avg
(Quesnelia)

CO, Avgtime without
DI
Quesnelia
ECS

1,000

35 sec

20 sec

21 sec

0.870 sec

 

 

600 ms

 

 

1.361 sec

 

 

1100 ms

 

5,000

1 min 41 sec

1 min 19 sec

1 min 09 sec

0.878 sec

1.772 sec

10,000

3 min 4 sec

2 min 35 se

2 min 17 se

0.955 sec

1.905 sec

25,000

6 min 32 sec

6 min 26 sec

6 min 20 sec

0.970 sec

1.920 sec

50,000

13 min 48 sec

12 min 16 sec

13 min 49 sec

1.040 sec

1.907 sec

Test 3. Multitenant DI 50K-25K-10K-5K-1K.

Test3

Num of records

Tenant 1
QCON-Central
duration

Tenant 2
QCON-College
duration

Tenant 3
QCON-Professional
duration

Test3

Num of records

Tenant 1
QCON-Central
duration

Tenant 2
QCON-College
duration

Tenant 3
QCON-Professional
duration

  1.  

1,000

2 hr 50 min

1 hr 44 min

17 min

  1.  

5,000

2 hr 56 min

1 hr 29 min

18 min

  1.  

10,000

3 hr 07 min

2 hr 28 min

28 min

  1.  

25,000

6 hr 20 min

4 hr 57 min

2 hr 02 min

  1.  

50,000

6 hr 25 min

Stopped after 14 hours

Stopped after 14 hours




Resource utilization

Test 1. DI Central tenant 1k-5K-10K-25K-50K

CPU Utilization.

image-20240620-162558.png

 

Memory utilization.
Memory utilization was stable during the tests. No memory leaks

image-20240610-121737.png

RDS utilization
During 1K DI maximal CPU utilization was about 45%, 5K- 86%, 10K, 25K and 50K - 95%

DB connections

DB connections were changing values from 550 connection to 870

 

Database load 

Top SQL

Open Search metrics
Master node CPU utilization

CPU utilization percentage for the master node reached maximum 35%

Maximum CPU utilization percentage for all data nodes.

Maximum memory usage percentage for all data nodes.

Amazon Managed Streaming



Test 2. DI Central tenant 1k-5K-10K-25K-50K + CI/CO 5VU

CPU Utilization.

Service

CPU Utilization

Service

CPU Utilization

mod-entities-links-b

79.2442138658

mod-source-record-storage-b

44.602495575

mod-inventory-b

42.361438768

nginx-okapi

37.4284422816

mod-di-converter-storage-b

31.5337739652

okapi-b

22.5758662868

mod-source-record-manager-b

20.3682207187

mod-consortia-b

19.3874072694

mod-users-b

15.4544924435

mod-dcb-b

9.34486144886

mod-authtoken-b

7.89295741471

mod-quick-marc-b

7.78387751245



Memory utilization.

Memory utilization was stable during the tests. No memory leaks.

Top 10 Memory consumption services during 50K DI

 

mod-data-export-worker-b

76.5675541393

mod-consortia-b

68.1640625

mod-data-import-b

67.3336948662

mod-dcb-b

65.8766055764

mod-orders-b

60.6285026803

mod-search-b

58.4146859083

mod-source-record-manager-b

52.8405

mod-copycat-b

50.60546875

mod-notes-b

48.9955357143

mod-invoice-b

48.8106115984

mod-calendar-b

48.6264489348


RDS utilization

During 1K DI maximal CPU utilization was about 35%, 5K- 76%, 10K, 25K and 50K - 95%

 

DB connections

 

Database load 

Top SQL

WITH cte AS (SELECT count(*) AS total_count FROM cs00000int_mod_source_record_manager.job_execution WHERE parent_job_id = $1 AND subordination_type in ('CHILD', 'COMPOSITE_CHILD') AND is_deleted = false) SELECT j.*, cte.*, p.total_records_count total, p.succeeded_records_count + p.error_records_count currently_processed FROM cs00000int_mod_source_record_manager.job_execution j LEFT JOIN cs00000int_mod_source_record_manager.job_execution_progress p ON j.id = p.job_execution_id LEFT JOIN cte ON true WHERE parent_job_id = $1 AND subordination_type in ('CHILD', 'COMPOSITE_CHILD') AND is_deleted = false LIMIT $2 OFFSET $3
WITH cte AS (SELECT count(*) AS total_count FROM cs00000int_mod_source_record_manager.job_execution WHERE subordination_type <> 'PARENT_MULTIPLE' AND TRUE AND status IN ('COMMITTED', 'ERROR') AND job_profile_id IN ('4f81d353-c1b9-4f15-bf20-a1de2949ebf9') AND subordination_type NOT IN ('COMPOSITE_PARENT') AND NOT job_profile_hidden AND NOT is_deleted) SELECT j.*, cte.*, p.total_records_count total, p.succeeded_records_count + p.error_records_count currently_processed, (select jsonb_agg(x) composite_data from (select status, count(1) cnt, sum(p1.total_records_count) total_records_count, sum(p1.succeeded_records_count + p1.error_records_count) currently_processed FROM cs00000int_mod_source_record_manager.job_execution j1 LEFT JOIN cs00000int_mod_source_record_manager.job_execution_progress p1 ON j1.id = p1.job_execution_id where j1.parent_job_id = j.id and j1.id != j1.parent_job_id and j1.subordination_type = 'COMPOSITE_CHILD' group by status) x) composite_data FROM cs00000int_mod_source_record_manager.job_execution j LEFT JOIN cs00000int_mod_source_record_manager.job_execution_progress p ON j.id = p.job_execution_id LEFT JOIN cte ON true WHERE subordination_type <> 'PARENT_MULTIPLE' AND TRUE AND status IN ('COMMITTED', 'ERROR') AND job_profile_id IN ('4f81d353-c1b9-4f15-bf20-a1de2949ebf9') AND subordination_type NOT IN ('COMPOSITE_PARENT') AND NOT job_profile_hidden AND NOT is_deleted ORDER BY completed_date desc LIMIT $1 OFFSET $2
WITH cte AS (SELECT count(*) AS total_count FROM cs00000int_mod_source_record_manager.job_execution WHERE subordination_type <> 'PARENT_MULTIPLE' AND TRUE AND status IN ('COMMITTED', 'ERROR', 'CANCELLED') AND job_profile_id NOT IN ('d0ebb7b0-2f0f-11eb-adc1-0242ac120002', '91f9b8d6-d80e-4727-9783-73fb53e3c786') AND file_name NOT IN ('No file name') AND subordination_type NOT IN ('COMPOSITE_PARENT') AND NOT job_profile_hidden AND NOT is_deleted) SELECT j.*, cte.*, p.total_records_count total, p.succeeded_records_count + p.error_records_count currently_processed, (select jsonb_agg(x) composite_data from (select status, count(1) cnt, sum(p1.total_records_count) total_records_count, sum(p1.succeeded_records_count + p1.error_records_count) currently_processed FROM cs00000int_mod_source_record_manager.job_execution j1 LEFT JOIN cs00000int_mod_source_record_manager.job_execution_progress p1 ON j1.id = p1.job_execution_id where j1.parent_job_id = j.id and j1.id != j1.parent_job_id and j1.subordination_type = 'COMPOSITE_CHILD' group by status) x) composite_data FROM cs00000int_mod_source_record_manager.job_execution j LEFT JOIN cs00000int_mod_source_record_manager.job_execution_progress p ON j.id = p.job_execution_id LEFT JOIN cte ON true WHERE subordination_type <> 'PARENT_MULTIPLE' AND TRUE AND status IN ('COMMITTED', 'ERROR', 'CANCELLED') AND job_profile_id NOT IN ('d0ebb7b0-2f0f-11eb-adc1-0242ac120002', '91f9b8d6-d80e-4727-9783-73fb53e3c786') AND file_name NOT IN ('No file name') AND subordination_type NOT IN ('COMPOSITE_PARENT') AND NOT job_profile_hidden AND NOT is_deleted ORDER BY completed_date desc LIMIT $1 OFFSET $2

 

Maximum CPU utilization percentage for all data nodes.

Maximum memory usage percentage for all data nodes.

Amazon Managed Streaming

 

Test 3. Multitenant DI 50K-22K-10K-5K-1K.

Top 10 CPU consumption services

Memory utilization.

Memory utilization was stable during the tests. No memory leaks

RDS utilization
During the test DB CPU utilization grow up to 100% and in 8 hours came back to its “Before test state“

 

DB connections

At he begging of the test number of DB connection growed up from 580 to 1000 and during next 8 hours was going wown to it started value.image-20240611-122923.png

Database load 

 

Top SQL

 

Open Search metrics
Master node CPU utilization

CPU utilization percentage for the master node reached maximum 47%

Maximum CPU utilization percentage for all data nodes.

Maximum memory usage percentage for all data nodes.

Amazon Managed Streaming

 

Appendix

Infrastructure

PTF - environment Quesnelia (QCON)

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 ]

  • 1 instance of db.r6.xlarge database instance: Writer instance

  • OpenSearch

    • domain: fse

    • Number of nodes: 9

    • Version: OpenSearch_2_7_R20240502

  • MSK - tenat

    • 4 kafka.m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

    • Kafka consolidated topics enabled


Quesnelia modules memory and CPU parameters

Module

Task Def. Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize

MaxMetaspaceSize

Module

Task Def. Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize

MaxMetaspaceSize

qcon-pvt

 

 

 

 

 

 

 

 

 

pub-okapi

1

pub-okapi:2023.06.14

2

1024

896

128

768

0

0

pub-edge

1

pub-edge:2023.06.14

2

1024

896

128

768

0

0

okapi-b

1

okapi:5.3.0

3

1684

1440

1024

922

384

512

nginx-okapi

1

nginx-okapi:2023.06.14

2

1024

896

128

0

0

0

nginx-edge

1

nginx-edge:2023.06.14

2

1024

896

128

0

0

0

mod-users-bl

1

mod-users-bl:7.7.0

2

1440

1152

512

922

88

128

mod-users

1

mod-users:19.3.1

2

1024

896

128

768

88

128

mod-user-import

1

mod-user-import:3.8.0

2

1024

896

128

768

88

128

mod-template-engine

1

mod-template-engine:1.20.0

2

1024

896

128

768

88

128

mod-tags

1

mod-tags:2.2.0

2

1024

896

128

768

88

128

mod-source-record-storage

1

mod-source-record-storage:5.8.0

2

5600

5000

2048

3500

384

512

mod-source-record-manager

1

mod-source-record-manager:3.8.0

2

5600

5000

2048

3500

384

512

mod-settings

1

mod-settings:1.0.3

2

1024

896

200

768

88

128

mod-service-interaction

1

mod-service-interaction:4.0.1

2

2048

1844

256

1290

384

512

mod-serials-management

1

mod-serials-management:1.0.0

2

2480

2312

128

1792

384

512

mod-sender

1

mod-sender:1.12.0

2

1024

896

128

768

88

128

mod-search

9

mod-search:3.3.0-SNAPSHOT.224

2

2592

2480

2048

1440

512

1024

mod-rtac

1

mod-rtac:3.6.0

2

1024

896

128

768

88

128

mod-remote-storage

1

mod-remote-storage:3.2.0

2

4920

4472

1024

3960

512

512

mod-quick-marc

1

mod-quick-marc:5.1.0

1

2288

2176

128

1664

384

512

mod-pubsub

1

mod-pubsub:2.13.0

2

1536

1440

1024

922

384

512

mod-permissions

2

mod-permissions:6.5.0

2

1684

1544

512

1024

384

512

mod-patron-blocks

1

mod-patron-blocks:1.10.0

2

1024

896

1024

768

88

128

mod-patron

1

mod-patron:6.1.0

2

1024

896

128

768

88

128

mod-password-validator

1

mod-password-validator:3.2.0

2

1440

1298

128

768

384

512

mod-organizations-storage

1

mod-organizations-storage:4.7.0

2

1024

896

128

700

88

128

mod-organizations

1

mod-organizations:1.9.0

2

1024

896

128

700

88

128

mod-orders-storage

1

mod-orders-storage:13.7.0

2

1024

896

512

700

88

128

mod-orders

1

mod-orders:12.8.0

2

2048

1440

1024

1024

384

512

mod-oai-pmh

1

mod-oai-pmh:3.13.0

2

4096

3690

2048

3076

384

512

mod-notify

1

mod-notify:3.2.0

2

1024

896

128

768

88

128

mod-notes

1

mod-notes:5.2.0

2

1024

896

128

952

384

512

mod-ncip

1

mod-ncip:1.14.4

2

1024

896

128

768

88

128

mod-login-saml

1

mod-login-saml:2.8.0

2

1024

896

128

768

88

128

mod-login

1

mod-login:7.11.0

2

1440

1298

1024

768

384

512

mod-lists

1

mod-lists:2.0.0

2

3000

2600

128

2048

384

512

mod-licenses

1

mod-licenses:6.0.0

2

2480

2312

128

1792

384

512

mod-kb-ebsco-java

1

mod-kb-ebsco-java:4.0.0

2

1024

896

128

768

88

128

mod-invoice-storage

1

mod-invoice-storage:5.8.0

2

1872

1536

1024

1024

384

512

mod-invoice

1

mod-invoice:5.8.0

2

1440

1152

512

922

88

128

mod-inventory-update

1

mod-inventory-update:3.3.0

2

1024

896

128

768

88

128

mod-inventory-storage

2

mod-inventory-storage:27.2.0-SNAPSHOT.738

2

4096

3690

2048

3076

384

512

mod-inventory

3

mod-inventory:20.2.0

2

2880

2592

1024

1814

384

512

mod-graphql

1

mod-graphql:1.12.1

2

1024

896

128

768

88

128

mod-gobi

1

mod-gobi:2.8.0

2

1024

896

128

700

88

128

mod-fqm-manager

1

mod-fqm-manager:2.0.1

2

3000

2600

128

2048

384

512

mod-finance-storage

1

mod-finance-storage:8.6.0

2

1024

896

1024

700

88

128

mod-finance

1

mod-finance:4.9.0

2

1024

896

128

700

88

128

mod-feesfines

1

mod-feesfines:19.1.0

2

1024

896

128

768

88

128

mod-event-config

1

mod-event-config:2.7.0

2

1024

896

128

768

88

128

mod-eusage-reports

1

mod-eusage-reports:2.1.1

2

1024

896

128

768

88

128

mod-erm-usage-harvester

1

mod-erm-usage-harvester:4.5.0

2

1024

896

128

768

88

128

mod-erm-usage

1

mod-erm-usage:4.7.0

2

1024

896

128

768

88

128

mod-entities-links

1

mod-entities-links:3.0.0

2

2592

2480

400

1440

0

1024

mod-email

1

mod-email:1.17.0

2

1024

896

128

768

88

128

mod-ebsconet

1

mod-ebsconet:2.2.0

2

1248

1024

128

700

128

256

mod-di-converter-storage

2

mod-di-converter-storage:2.2.2

2

1024

896

128

768

88

128

mod-dcb

1

mod-dcb:1.1.0

2

1024

896

128

768

88

128

mod-data-import

1

mod-data-import:3.1.0

1

2048

1844

256

1292

384

512

mod-data-export-worker

1

mod-data-export-worker:3.2.1

2

3072

2048

1024

2048

384

512

mod-data-export-spring

1

mod-data-export-spring:3.2.0

1

2048

1844

256

1536

384

512

mod-data-export

1

mod-data-export:5.0.0

1

2048

1524

1024

0

0

0

mod-courses

1

mod-courses:1.4.10

2

1024

896

128

768

88

128

mod-copycat

1

mod-copycat:1.6.0

2

1024

512

128

768

88

128

mod-consortia

1

mod-consortia:1.1.0

2

3072

2048

128

2048

512

1024

mod-configuration

1

mod-configuration:5.10.0

2

1024

896

128

768

88

128

mod-circulation-storage

1

mod-circulation-storage:17.2.0

2

2880

2592

1536

1814

384

512

mod-circulation-item

1

mod-circulation-item:1.0.0

2

1024

896

128

0

0

0

mod-circulation

2

mod-circulation:24.2.1

2

2880

2592

1536

1814

384

512

mod-calendar

1

mod-calendar:3.1.0

 

2

1024

896

128

768

88

mod-bulk-operations

1

mod-bulk-operations:2.0.0

2

3072

2600

1024

1536

384

512

mod-batch-print

1

mod-batch-print:1.1.0

2

1024

896

128

768

88

128

mod-authtoken

3

mod-authtoken:2.15.1

2

1440

1152

512

922

88

128

mod-audit

1

mod-audit:2.9.0

2

1024

896

128

768

88

128

mod-agreements

1

mod-agreements:7.0.0

2

1592

1488

128

0

0

0

edge-rtac

1

edge-rtac:2.7.1

2

1024

896

128

768

88

128

edge-patron

1

edge-patron:5.1.0

2

1024

896

256

768

88

128

edge-orders

1

edge-orders:3.0.0

2

1024

896

128

768

88

128

edge-oai-pmh

1

edge-oai-pmh:2.9.0

2

1512

1360

1024

1440

384

512

edge-ncip

1

edge-ncip:1.9.2

2

1024

896

128

768

88

128

edge-fqm

1

edge-fqm:2.0.0

2

1024

896

128

768

88

128

edge-dematic

1

edge-dematic:2.2.0

1

1024

896

128

768

88

128

edge-dcb

1

edge-dcb:1.1.0

2

1024

896

128

768

88

128

edge-courses

1

edge-courses:1.4.0

2

1024

896

128

768

88

128

edge-connexion

1

edge-connexion:1.2.0

2

1024

896

128

768

88

128

edge-caiasoft

1

edge-caiasoft:2.2.0

2

1024

896

128

768

88

128

Methodology/Approach

Before each test files for Marc Aytority creation must created using python script

  • Test 1: Manually tested 1k, 10k, 25k and 50k records files DI started on the central tenant only.

  • Test 2: Manually tested 1k, 10k, 25k and 50k records files DI started on central tenant. In parralel was started Check-in and Checkout (CICO) for 5 concurrent users.

  • Test 3: Manually tested DI started on 3 tenants concurrently. Order for load file without pause between files: 50k, 25k, 10k, 5k, and 1k for order tenants : Tenant 1-3.

    Test results were obtained using SQL query