Data Import test report (Orchid) baseline for ocp3

Data Import test report (Orchid) baseline for ocp3



Overview

This document contains the results of testing Data Import for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3.https://folio-org.atlassian.net/browse/PERF-662 

Summary

  • Duration for DI correlates with number of the records imported (100k records- 32 min, 250k - 1 hour 33 min, 500k - 3 hours 33 min). Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order.  Response time for Check-In/Check-Out is prolonged twice (for Check-In from 0.517s to 1.138s, for Check-Out from 0.796s to 1.552s) during DI.

  • The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.

  • Average CPU usage for the test with 500k records Created for mod-di-converter-storage was about 462%, and for all other modules did not exceed 150 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 400%.

  • Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

It is recommended to increase CPU units for mod-di-converter-storage to 512.

Results

Test #



Duration

ocp3

Results

Test #



Duration

ocp3

Results

1

100K MARC Create

PTF - Create 2

32-33 minutes

Completed

1

250K MARC Create 

PTF - Create 2

1 hour 33 min - 1 hour 57 min

Completed

1

500K MARC Create

PTF - Create 2

3 hours 33 min

Completed

2

Multitenant MARC Create (100k, 50k, and 1 record)

PTF - Create 2

3 hours 1 min

Completed

3

CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants)

PTF - Create 2

24 min

Completed *



 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException

Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants

Test#3

CI/CO Response Time with DI

CI/CO Response Time  without DI

Test#3

CI/CO Response Time with DI

CI/CO Response Time  without DI

Check-In

1.138 s

0.517 s

Check-Out

1.552 s

0.796 s

Test#3

DI Duration with CI/CO

DI Duration without CI/CO*

Test#3

DI Duration with CI/CO

DI Duration without CI/CO*

Tenant _1

20 min

14 min (18 min for run 2)

Tenant _2

19 min

16 min (18 min for run 2)

Tenant _3

16 min

16 min (15 min for run 2)

 * - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k

Memory Utilization

The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.

MARC BIB CREATE

Test#1 100k, 250k, 500k records DI

Test#2 Multitenant  DI (9 concurrent jobs)

Test#3 With CI/CO

Service CPU Utilization 

MARC BIB CREATE

Average CPU usage for the test with 500k records Created for mod-di-converter-storage was about 462%, and for all other modules did not exceed 150 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 400%.

Test#1  250k, 500k records DI

Test#2 Multitenant

Test#3 With CI/CO

Instance CPU Utilization

Test#1  250k, 500k records DI

Test#2 Multitenant DI (9 concurrent jobs)

RDS CPU Utilization 

MARC BIB CREATE

Approximately DB CPU usage is up to 95%

Test#1  250k, 500k records DI

Test#2 Multitenant  DI (9 concurrent jobs)

Test#3 With CI/CO

RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 520 connections count.

Test#1  250k, 500k records DI

Test#2 Multitenant

Test#3 With CI/CO

Appendix

Infrastructure ocp3

Records count :

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629

  • tenant2_mod_source_record_storage.marc_records_lb = 0

  • tenant3_mod_source_record_storage.marc_records_lb = 0

  • tenant0_mod_source_record_storage.raw_records_lb = 9604805

  • tenant2_mod_source_record_storage.raw_records_lb = 0

  • tenant3_mod_source_record_storage.raw_records_lb = 0

  • tenant0_mod_source_record_storage.records_lb = 9674677

  • tenant2_mod_source_record_storage.records_lb = 0

  • tenant3_mod_source_record_storage.records_lb = 0

  • tenant0_mod_source_record_storage.marc_indexers =  620042011

  • tenant2_mod_source_record_storage.marc_indexers =  0

  • tenant3_mod_source_record_storage.marc_indexers =  0

  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833

  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0

  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0

  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844

  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0

  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0

  • tenant0_mod_inventory_storage.authority = 4

  • tenant2_mod_inventory_storage.authority = 0

  • tenant3_mod_inventory_storage.authority = 0

  • tenant0_mod_inventory_storage.holdings_record = 9592559

  • tenant2_mod_inventory_storage.holdings_record = 16

  • tenant3_mod_inventory_storage.holdings_record = 16

  • tenant0_mod_inventory_storage.instance = 9976519

  • tenant2_mod_inventory_storage.instance = 32

  • tenant3_mod_inventory_storage.instance = 32 

  • tenant0_mod_inventory_storage.item = 10787893

  • tenant2_mod_inventory_storage.item = 19

  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • 2 database  instances, one reader, and one writer

  • MSK ptf-kakfa-3

    • 4 m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

  • Kafka topics partitioning: - 2 partitions for DI topics

Module
ocp3-pvt
Mon Sep 11 09:33:28 UTC 2023

Task Def. Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize

MaxMetaspaceSize

R/W split enabled

Module
ocp3-pvt
Mon Sep 11 09:33:28 UTC 2023

Task Def. Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize

MaxMetaspaceSize

R/W split enabled

mod-remote-storage

13

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-remote-storage:2.0.3

2

4920

4472

1024

3960

512

512

false

mod-agreements

8

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-agreements:5.5.2

2

1592

1488

128

968

384

512

false

mod-data-import

7

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-data-import:2.7.1

1

2048

1844

256

1292

384

512

false

mod-search

30

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-search:2.0.1

2

2592

2480

2048

1440

512

1024

false

mod-authtoken

7

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-authtoken:2.13.0

2

1440

1152

512

922

88

128

false

mod-configuration

7

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-configuration:5.9.1

2

1024

896

128

768

88

128

false

mod-inventory-storage

1

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-inventory-storage:26.1.0-SNAPSHOT.665

0

2208

1952

1024

1440

384

512

false

mod-circulation-storage

15

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-circulation-storage:16.0.1

2

2880

2592

1536

1814

384

512

false

mod-source-record-storage

11

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-source-record-storage:5.6.7

2

5600

5000

2048

3500

384

512

false

mod-calendar

7

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-calendar:2.4.2

2

1024

896

128

768

88

128

false

mod-inventory

12

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-inventory:20.0.6

2

2880

2592

1024

1814

384

512

false

mod-circulation

9

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-circulation:23.5.6

2

2880

2592

1536

1814

384

512

false

mod-di-converter-storage

8

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-di-converter-storage:2.0.5

2

1024

896

128

768

88

128

false

mod-pubsub

8

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-pubsub:2.9.1

2

1536

1440

1024

922

384

512

false

mod-users

8

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-users:19.1.1

2

1024

896

128

768

88

128

false

mod-patron-blocks

8