Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Table of Contents

Overview

This document contains the results of testing Data Import Splitting Feature for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3.

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-644
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-645
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-647
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-646
Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-671

Splitting feature documentation Detailed Release Notes for Data Import Splitting Feature

Summary

  • Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
  • ---------Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order.  Duration for Check-In/Check-Out is prolonged twice during DI.
  • This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules.
  • Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
  • Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-924

Results

...

Profile

...

Splitting Feature Disabled

...

100K MARC Create

...

250K MARC Create 

...

100K MARC Update (Create new file)

...

58 min 25 sec

57 min 19 sec

...

250K MARC Update

...

2 hours 2 min **

2 hours 12 min

...

Completed with errors **

Completed

...

4 hours 43 min

4 hours 38 minutes

...

Completed

Completed

...

 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-748
?

 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get

Data Import Robustness Enhancement 

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-646

...

Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants

...

 * - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k

Memory Utilization

This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for DI modules.

MARC BIB CREATE

Test#1 100k, 250k, 500k records DI

Image Removed

Test#2 Multitenant  DI (9 concurrent jobs)
Image Removed

Test#3 With CI/CO

Service CPU Utilization 

MARC BIB CREATE

Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.

Test#1 500k records DI

Image Removed

Test#2 Multitenant

Image Removed

Test#3 With CI/CO

Instance CPU Utilization

Test#1 500k records DI

Image Removed

...

Image Removed

RDS CPU Utilization 

MARC BIB CREATE

Approximately DB CPU usage is up to 95%

Test#1  500k records DI

Image Removed

Test#2 Multitenant  DI (9 concurrent jobs)

Image Removed

Test#3 With CI/CO

RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 535 connections count.

Test#1  500k records DI

Image Removed

Test#2 Multitenant
Image Removed

Test#3 With CI/CO

Appendix

Infrastructure ocp3

Records count :

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

...

2 database  instances, one reader, and one writer

...

  • 4 m5.2xlarge brokers in 2 zones
  • Apache Kafka version 2.8.0

  • EBS storage volume per broker 300 GiB

  • auto.create.topics.enable=true
  • log.retention.minutes=480
  • default.replication.factor=3

...

Before Splitting Feature released

...

Overview

The Data Import Task Force (DITF) implements a feature that splits large input MARC files into smaller ones, resulting in smaller jobs, so that the big files could be imported and be imported consistently.  This document contains 1. Test with 1, 2, and 3 tenants' concurrent jobs with configurations the results of performance tests on the feature and also an analysis the feature's performance with respect to the baseline tests.  The following Jiras were implemented. 

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-644
Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-645
Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-647
Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-646
Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-671

Summary

  • The file-splitting feature is stable and offers more robustness to Data Import jobs even with the current infrastructure configuration. If there were failures, it's easier now to find the exact failed records to take actions on them. 
    • No stuck jobs in all tests performed.
    • There were errors (see below) in some partial jobs, but they still completed so the entire job status is "Completed with error".
    • Both of kinds of imports, create and update MARC BIBs worked well with this file-splitting feature enabled and also disabled. 
  • There is no performance degradations, jobs not getting slower, on single-tenant imports. On multi-tenants imports, performance is be a little better
  • Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
  • Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order. 
  • No memory leak is suspected for all of the modules.
  • Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.  Big improvement over the previous version (without file-splitting) for 500K imports where mod-di-converter-storage's CPU utilization was 462% and other modules were above 100% and up to 150%. 
  • Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

  1. One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
    Jira Legacy
    serverSystem JIRA
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-748
    Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.
  2. During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
    Jira Legacy
    serverSystem JIRA
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-930
  3. UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
    Jira Legacy
    serverSystem JIRA
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-929
  4. Usage:
    • Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.  CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
    • When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
    • Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk. 
  5. More CPU could be allocated to mod-inventory and mod-di-converter-storage

Results

Test #

Profile

Splitting Feature EnabledResults

Splitting Feature Disabled

ResultsBefore Splitting Feature DeployedResults
1

100K MARC BIB Create

PTF - Create 237 min -39 minCompleted40 minCompleted32-33 minutesCompleted
1

250K MARC BIB Create 

PTF - Create 21 hour 32 minCompleted1 hour 41 minCompleted1 hour 33 min - 1 hour 57 minCompleted
1500K MARC BIB CreatePTF - Create 23 hours 29 minCompleted*3 hours 55 minCompleted3 hours 33 minCompleted
2Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 22 hours 40 minCompleted*2 hours 43 minCompleted*3 hours 1 minCompleted
3CI/CO + DI MARC BIB Create (20 users CI/CO, 25k records DI on 3 tenants)PTF - Create 224 min 18 secCompleted31 min 31 secCompleted24 minCompleted *
4

100K MARC BIB Update (Create new file)

PTF - Updates Success - 1

58 min 25 sec

57 min 19 sec

Completed1 hour 3 minCompleted--
4

250K MARC BIB Update

PTF - Updates Success - 1

2 hours 2 min **


2 hours 12 min

Completed with errors **

Completed

1 hour 53 minCompleted--
4500K MARC BIB UpdatePTF - Updates Success - 1

4 hours 43 min

4 hours 38 minutes

Completed

Completed

5 hour 59 minCompleted--

 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-748
 Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing.


 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for  'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODDATAIMP-930


Test 1,2. 100k, 250K, 500k and Multitenant MARC BIB Create

Memory Utilization

This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for DI modules.

MARC BIB CREATE

Test#1 100k, 250k, 500k records DI

Image Added

Test#2 Multitenant  DI (9 concurrent jobs)
Image Added

Service CPU Utilization 

MARC BIB CREATE

Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.

Test#1 500k records DI

Image Added


Test#2 Multitenant

Image Added


Instance CPU Utilization

Test#1 500k records DI

Image Added

Test#2 Multitenant DI (9 concurrent jobs)

Image Added


RDS CPU Utilization 

MARC BIB CREATE

Approximately DB CPU usage is up to 95%

Test#1  500k records DI

Image Added

Test#2 Multitenant  DI (9 concurrent jobs)

Maximal DB CPU usage is about 95%

Image Added


RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 535 connections count.

Test#1  500k records DI

Image Added

Test#2 Multitenant
Image Added


Test 3 With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Enabled & 

Splitting Feature Disabled



Response time without DI

Before Splitting Feature Deployed

Response time with DI

Before Splitting Feature Deployed

Response time without DI

Splitting Feature disabled

Response time with DI 

Splitting Feature disabled

Response time without DI 
(Average) 

Splitting Feature enabled

Response time with DI

(Average) Splitting Feature enabled

Check-In0.517s1.138s0.542s1.1s0.505s1.067s
Check-Out0.796s1.552s0.841s1.6s0.804s1.48s



DI Duration without CI/CO

Before Splitting Feature Deployed

DI Duration with CI/CO

Before Splitting Feature Deployed

DI Duration without CI/CO

Splitting Feature disabled

DI Duration with CI/CO

Splitting Feature disabled

DI Duration without CI/CO DI Duration with CI/CO 
Tenant _114 min (18 min for run 2)20 min27min 47sec31min 30sec16min 18sec16 min 53 sec
Tenant _216 min (18 min for run 2)19 min23min 16sec26min 22sec20min 13sec20min 39 sec
Tenant _316 min (15 min for run 2)16 min18min 40sec20min 44sec17min 42sec17min 54 sec


 * - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k

Response time graph
Image Added

With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled

ocp3-mod-data-import:12

Image Added

Data Import Robustness Enhancement

25K records RECORDS_PER_SPLIT_FILE
Number of concurrent tenantsJob profile 500Status1KStatus5KStatus10KStatusTest with Split disabledStatus
1 Tenant test#1PTF - Create 212 minutes 55 secondsCompleted11 minutes 48 secondsCompleted09 minutes 21 secondsCompleted9 minutes 2 secCompleted10 minutes 35 secCompleted
1 Tenant test#210 minutes 31 secondsCompleted09 minutes 32 secondsCompleted9 minutes 6 secCompleted9 minutes 14 secCompleted11 minutes 27 secCompleted
2 Tenants test#1PTF - Create 219 minutes 29 secondsCompleted15 minutes 47 secondsCompleted16 minutes 15 secondsCompleted16 minutes 3 secondsCompleted19 minutes 18 secCompleted
2 Tenants test#218 minutes 19 secondsCompleted15 minutes 47 secondsCompleted16 minutes 11 secCompleted16 min 41 secCompleted20 minutes 33 secCompleted
3 Tenants test#1PTF - Create 2

24 minutes 15 seconds

Completed

25 minutes 47 seconds

Completed23 minutes Completed23 minutes 27 secondsCompleted30 minutes 2 secCompleted
3 Tenants test#224 minutes 38 secondsCompleted

23 minutes 28 seconds

Completed23 minutes 2 secCompleted23 minutes 26 secondsCompleted

29 minutes 54 sec

Completed *

*   T1 - "00:33:35.1" Error T2 - "01:23:36.144" T3 - "01:16:26.391" on the first tenantproccesing stoped wit error "io.vertx.core.impl.NoStackTraceThrowable: Connection is not active now, current status: CLOSED "

it caused the spike of CPU utilization on Kafka (tenant cluster) up to 94% 

Image Added

Instance CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. The maximal CPU Utilization value is 38%. 

Image Added

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. The maximal CPU Utilization value is 37%. 

Image Added

Memory Utilization

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.

Most of the modules were stable during the test, and no memory leak is suspected for DI modules, only 2 modules increased memory consumption usage after the beginning of the tests
Image Added

Memory utilization rich maximal value for mod-source-record-storage-b 88%  and for mod-source-record-manager-b 85%.

Image Added

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

Image Added

Service CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.

Image Added

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.

Image Added

CPU utilization of  mod-di-converter-storage-b

 Image Added

RDS CPU Utilization 

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal  CPU Utilization = 95%


Image Added

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal  CPU Utilization = 94%

Image Added

RDS Database Connections

Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.

Image Added

Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal  CPU Utilization = 94%

Image Added

Retesting DI file-splitting feature on Poppy release

Retest the DI feature to be sure that the new changes have not affected performance negatively.  Retest the DI file-splitting feature for the following scenarios:

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-681

Brief comparison summary


The duration of the import date has increased, in particular(diff= Poppy time processing - Orchid time processing ):

  • 250K MARC BIB Create PTF - Create 2 ---> 44 minutes
  • 250K MARC BIB UpdatePTF - Updates Success - 1 -→ 45 minutes
  • Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 2 -→1 hour 35 minutes
    • Check-Out without DI ~ 200ms
    • Check-In without DI ~ 65ms
    • Check-Out with DI ~ 770ms
    • Check-in with DI ~ 330ms

Resource utilization:

  • Service CPU utilization on Poppy is about the same as on the Orchid;
  • Memory utilization on Poppy is about the same as on the Orchid;
  • RDS CPU Utilization  during all tests and on both releases was about 96%;
  • The number of connections to DB on both releases was about the same from 550(Test 1.1) to 1200(Test 1.4).


Test 1.  Single tenant(primary fs09000000): create and update 250K file 

Test #Test parametersProfile

Duration

(Poppy)

Splitting Feature Enabled

Status

Previous results 

(Orchid )

Duration

diff= Poppy time processing - Orchid time processing

Duration

(Poppy)

Splitting Feature Disabled

1.1250K MARC BIB Create PTF - Create 22 hours 16 min Completed1 hour 32 min44 minutesfailed
1.2250K MARC BIB UpdatePTF - Updates Success - 13 hours 1 min Completed2 hours 16 min45 minutesfailed
1.3Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 24 hours 14min Completed2 hours 40 min1 hour 35 minutesfailed

On Poppy with the split feature disabled, large files stopped processing. Created ticket to this problem

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-744

Test 1.4 With CI/CO 20 users and DI 25k records on each of the 3 tenants 

Splitting Feature enabled

Release: Orchid

Response time without DI (Average) 


Release: Orchid
Response time with DI
(Average)


Release: Poppy
Response time without DI (Average) 

Release: Poppy
Response time with DI (Average) 

diff= Poppy time processing - Orchid time processing

without DI

diff= Poppy time processing - Orchid time processing

with DI

Check-Out0.804s1.48s1.03s2.26s 200ms770ms
Check-In0.505s1.067s0.570s1.4s65ms330ms



Release: Orchid

DI Duration with CI/CO 

Release: Poppy

DI Duration with CI/CO 

Tenant _116 min 53 sec34 min 55 sec
Tenant _220min 39 sec27 min 39 sec
Tenant _317min 54 sec25 min 17 sec


Resource utilization during testing

Test 1.1. Data-import of 250K records file with "PTF - Create 2" job profile

Service CPU Utilization 

The sharp spike of CPU at the beginning of test 1, We see similar behavior in all of the DI tests. СPU consumption was uniform during the test.

Image Added

Memory Utilization

The memory consumption was not affected, the mod-source-records-manager service increased the memory usage from 45% to 60% during the test, but after the test, the memory started to return to the pre-test value.

Image Added


RDS CPU Utilization  

Consumption of the database CPU was 97% throughout the test

Image Added

RDS Database Connections

The average number of DB connections during the test was about 550.


Image Added

Test 1.2. Data-import of 250K records file with "PTF - Update" job profile

Service CPU Utilization 

СPU consumption was stable during the test, except mod-inventory service at the beginning of the test the CPU usage was about 140% at the end of the test CPU value was about 200%.   
Image Added

Memory Utilization

The memory was stable and without memory leaks.
Image Added

RDS CPU Utilization 

Consumption of the database CPU was 97% throughout the test

Image Added

RDS Database Connections

The average number of DB connections during the test was about 550.

Image Added

Test 1.3. Multitenant MARC Create (100k, 50k, and 1 record)

Service CPU Utilization 

СPU consumption was stable during the test. However, in the last hour of the test, the services mod-inventory and mod-quick-mark increare the CPU utilization by 75%
Image Added

Memory Utilization

The memory was stable and without memory leaks.
Image Added

RDS CPU Utilization 

Consumption of the database CPU was 96% throughout the test

Image Added

RDS Database Connections

The average number of DB connections during the test was about 800.

Image Added


Test 1.4. Data-import of 250K records file with "PTF - Update" job profile

Service CPU Utilization 
Image Added

Memory Utilization 

The memory was stable and without memory leaks.

Image Added

RDS CPU Utilization 

Consumption of the database CPU was 96% throughout the test

Image Added

RDS Database Connections

The average number of DB connections during the test changed from 400 to 1200.

Image Added

CICO responce time graph
Image Added

Retesting DI file-splitting feature on Poppy release with Refresh Token Rotation (RTR) and file-splitting feature

The goal of the tests was to investigate how the file-splitting feature caused Data-import on Poppy release and the impact of Refresh Token Rotation (RTR). The tests were performed on ocp3(Poppy), pcp1(Poppy) and ncp5(Orchid)  environments.

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-723
Refresh Token Rotation (RTR)

Brief comparison summary

  • Refresh Token Rotation configuration does not affect the data import process in any way, whether creating or updating a profile.
  • In the Poppy release 250,000 records of data import with PTF - Create-2 job profile failed, and 50,000 records of data import with PTF - Updates Success - 1 job profile also failed in all of the tests, except configuration when FSF=ture;
  • Data import works slowly on Poppy compared to the Orchid
  • As the number of records in the file for data import increases, the processing time also increases. Up to 25,000 records, the duration of the data import is approximately the same.
  • In the Poppy release data-import with an enabled file-splitting feature works slower compared to data-import with a disabled file-splitting feature.
  • Data import is performed approximately 5% faster when the file-splitting feature parameters are absent in the task definition configuration.

Test results

DI tests/ Configurationncp5
Orchid
ocp3
FSF true  without RTR token
*ocp3
FSF false without RTR token
ocp3
FSF deleted without token

ocp3

FSF false

AT =RT=

300; 

ocp3

 FSF false

AT =RT= 1000000000

pcp1

FSF false

AT =RT= 10000000

pcp1

FSF false without token

retest*

250k_bib_Create_1.mrcnot testednot testedfailedfailedfailedfailedfailedfailed
100k_bib_Create.mrc00:41:4100:54:3200:54:3600:53:5900:48:5600:54:42.0500:47:17"01:01:39"
50k_bib_Create.mrc00:19:4300:30:4000:25:3900:22:1700:27:0500:30:0900:21:4500:20:46
25k_bib_Create.mrc00:10:1100:13:5300:12:4600:10:3300:12:4200:13:2500:11:5400:10:53
10k_bib_Create.mrc00:04:1900:07:2200:05:3500:04:38not tested00:05:33.00:04:4200:04:36
5k_bib_Create.mrc00:02:3500:04:3100:02:4300:02:55not tested00:03:0700:02:5500:02:30
1k_bib_Create.mrcnot testednot testednot testednot testednot testednot tested00:00:54not tested
DI-25K-Update.mrcnot testednot testedfinished
successfully
failedfailedfinished
successfully
failedfinished
successfully

Column with "pcp1 FSF false without token" has testing results on the configuration similar to "ocp3 FSF false without RTR token".

Resource utilization during testing

Service CPU utilization during the Data-import process

The next data import jobs were carried out
1) 5k_bib_Create 2) 10k_bib_Create  3) 25k_bib_Create 4) 50k_bib_Create 5) 50k_bib_Create 6) 100k_bib_Create 7) 50k_bib_Create 8) 25k_bib_Create 9) 25k_bib_Update 10) 50k_bib_Update(stopped)
CPU utilization was stable during all jobs, but some spikes of data-import jobs were at the beginning of all tests.
 

Image Added

Expand
titleDI test

Image Added


Memory Utilization

Most of the modules were stable during the test, and no memory leak is suspected for DI modules, except mod-inventory-b which consumed about 92% of memory during all DI processes. 
Image Added

RDS CPU Utilization 


Maximal  CPU Utilization = 95%

Image Added

RDS Database Connections

The maximal number of DB connections during the tests was about 580.

Image Added

Database load

Image Added

Top SQL queries

Image Added

Appendix

Infrastructure ocp3  with the "Bugfest" Dataset

Records count :

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731


  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

Before Splitting Feature released

Module
ocp3-pvt
Mon Sep 11 09:33:28 UTC 2023
Task Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
mod-remote-storage13mod-remote-storage:2.0.324920447210243960512512false
mod-agreements8mod-agreements:5.5.2215921488128968384512false
mod-data-import7mod-data-import:2.7.11204818442561292384512false
mod-search30mod-search:2.0.1225922480204814405121024false
mod-authtoken7mod-authtoken:2.13.021440115251292288128false
mod-configuration7mod-configuration:5.9.12102489612876888128false
mod-inventory-storage1mod-inventory-storage:26.1.0-SNAPSHOT.66502208195210241440384512false
mod-circulation-storage15mod-circulation-storage:16.0.122880259215361814384512false
mod-source-record-storage11mod-source-record-storage:5.6.725600500020483500384512false
mod-calendar7mod-calendar:2.4.22102489612876888128false
mod-inventory12mod-inventory:20.0.622880259210241814384512false
mod-circulation9mod-circulation:23.5.622880259215361814384512false
mod-di-converter-storage8mod-di-converter-storage:2.0.52102489612876888128false
mod-pubsub8mod-pubsub:2.9.12153614401024922384512false
mod-users8mod-users:19.1.12102489612876888128false
mod-patron-blocks8mod-patron-blocks:1.8.021024896102476888128false
mod-source-record-manager9mod-source-record-manager:3.6.425600500020483500384512false
nginx-edge7nginx-edge:2023.06.1421024896128000false
mod-quick-marc7mod-quick-marc:3.0.01228821761281664384512false
nginx-okapi7nginx-okapi:2023.06.1421024896128000false
okapi-b8okapi:5.0.13168414401024922384512false
mod-feesfines7mod-feesfines:18.2.12102489612876888128false
mod-patron7mod-patron:5.5.22102489612876888128false
mod-notes7mod-notes:5.0.121024896128952384512false
pub-okapi7pub-okapi:2023.06.142102489612876800false


Service versions for Splitting Feature test

Module
ocp3-pvt
Mon Sep 25 12:43:06 UTC 2023
Task Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
mod-data-import10mod-data-import:2.7.2-SNAPSHOT.1371204818442561292384512false
mod-search30mod-search:2.0.1225922480204814405121024false
mod-
authtoken
configuration
7
8mod-
authtoken
configuration:
2
5.
13
9.
0
12
1440
1024
1152
896
512
128
922
76888128false
mod-
configuration
bulk-operations7mod-bulk-
configuration
operations:
5
1.
9
0.
1
62
1024
3072
896
2600
128
1024
768
1536
88
384
128
512false
mod-inventory-storage1mod-inventory-storage:26.1.0-SNAPSHOT.66502208195210241440384512false
mod-circulation-storage15mod-circulation-storage:16.0.122880259215361814384512false
mod-source-record-storage
11
12mod-source-record-storage:5.6.725600500020483500384512false
mod-calendar7mod-calendar:2.4.22102489612876888128false
mod-inventory12mod-inventory:20.0.622880259210241814384512false
mod-circulation9mod-circulation:23.5.622880259215361814384512false
mod-di-converter-storage8mod-di-converter-storage:2.0.52102489612876888128false
mod-pubsub
8
9mod-pubsub:2.9.12153614401024922384512false
mod-users
8
9mod-users:19.1.12102489612876888128false
mod-patron-blocks
8
9mod-patron-blocks:1.8.021024896102476888128false
mod-source-record-manager
9
12mod-source-record-manager:3.6.5-SNAPSHOT.
4
24525600500020483500384512false
nginx-edge
7nginx-edge:2023.06.1421024896128000false
mod-quick-marc7mod-quick-marc:3.0.01228821761281664384512false
nginx-okapi7nginx-okapi:2023.06.1421024896128000false
okapi-b8okapi:5.0.1316841440
1024922384512falsemod-feesfines7mod-feesfines:18.2.12
1024
896
922
128
384
768
512
88128
false
mod-
patron
feesfines
7
8mod-
patron
feesfines:
5
18.
5
2.
2
12102489612876888128false
mod-notes7mod-notes:5.0.121024896128952384512false
pub-okapi7pub-okapi:2023.06.142102489612876800false

Service versions for retesting Splitting Feature test on Poppy release. 

Module
ocp3-pvt
Mon Sep 25 12:43:06 UTC 2023
Task Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmx
MetaspaceSizeMaxMetaspaceSizeR/W split enabledmod-data-import10mod-data-import:2.7.2-SNAPSHOT.1371204818442561292384512falsemod-search30mod-search:2.0.1225922480204814405121024falsemod-configuration8mod-configuration:5.9.12102489612876888128falsemod-bulk-operations7mod-bulk-operations:1.0.623072260010241536384512falsemod-inventory-storage1mod-inventory-storage:26.1.0-SNAPSHOT.66502208195210241440384512falsemod-circulation-storage15mod-circulation-storage:16.0.122880259215361814384512falsemod-source-record-storage12mod-source-record-storage:5.6.725600500020483500384512falsemod-calendar7mod-calendar:2.4.2
MetaspaceSizeMaxMetaspaceSizeR/W split enabled
mod-circulation-storage16mod-circulation-storage:17.1.022880259215361814384512FALSE
mod-source-record-storage13mod-source-record-storage:5.7.025600500020483500384512FALSE
mod-calendar8mod-calendar:2.5.02102489612876888128FALSE
mod-inventory13mod-inventory:20.1.022880259210241814384512FALSE
mod-circulation10mod-circulation:24.0.022880259215361814384512FALSE
mod-di-converter-storage9mod-di-converter-storage:2.1.02102489612876888128FALSE
mod-pubsub10mod-pubsub:2.11.02153614401024922384512FALSE
mod-users10mod-users:19.2.02102489612876888128FALSE
mod-patron-blocks10mod-patron-blocks:1.9.021024896
128
102476888128
false
FALSE
mod-source-record-
inventory
manager
12
15mod
-inventory:20
-source-record-manager:3.7.0
.6
2
2880
5600
2592
5000
1024
2048
1814
3500384512
false
FALSE
mod-quick-
circulation
marc
9
8mod-quick-
circulation
marc:
23
5.
5
0.
6
0
2
1
2880
2288
2592
2176
1536
128
1814
1664384512
false
FALSE
mod-di-converter-storage
nginx-okapi8
mod-di-converter-storage:2.0.5
nginx-okapi:2023.06.1421024896128
768
0
88
0
128
0
false
FALSE
mod
okapi-
pubsub
b9
mod-pubsub
okapi:
2
5.
9
1.1
2
3
1536
168414401024922384512
false
FALSE
mod-
users
feesfines9mod-
users
feesfines:19.
1
0.
1
02102489612876888128
false
FALSE
mod-
patron-blocks
notes
9
8mod-
patron-blocks
notes:5.1.
8.
021024896
1024
128
768
952
88
384
128
512
false
FALSE
mod-source-record-manager12mod-source-record-manager:3.6.5-SNAPSHOT.24525600500020483500384512falsemod-quick-marc7mod-quick-marc
pub-okapi8pub-okapi:2023.06.142102489612876800FALSE
mod-data-import36579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-data-import:3.0.
0
31
2288
2048
2176
1844
128
256
1664
1292384512
false
FALSE
nginx
mod-
okapiokapi-b8okapi:5.0.13168414401024922384512falsemod-feesfines8mod-feesfines:18.2.1
search
7nginx-okapi:2023.06.1421024896128000false
31579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-search:3.0.0225922480204814405121024FALSE
mod-configuration9mod-configuration:5.9.22102489612876888128
false
FALSE
mod-bulk-
notes
operations
7
8mod-
notes:5
bulk-operations:1.1.0
.1
2
1024
3072
896
2600
128
1024
952
1536384512
false
FALSE
pub
edge-
okapi
ncip
7
8
pub
edge-
okapi
ncip:
2023
1.
06
9.
14false
021024
89612876800





FALSE
mod-inventory-storage8mod-inventory-storage:27.0.028961




FALSE


Methodology/Approach

To set splitting feature: Detailed Release Notes for Data Import Splitting Feature

...

Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users, DI file size: 25k

Test 4. To define the optimal value for RECORDS_PER_SPLIT_FILE(500, 1K, 2K, 5K) data-import job with PTF-Create-2 profile were run for 25K for 1 tenant simultaneously, for 2 tenants and for 3 tenants.