Table of Contents

Overview

This document contains the results of testing Data Import Splitting Feature for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-644

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-645

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-647

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-646

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-671

Splitting feature documentation Detailed Release Notes for Data Import Splitting Feature

Summary

Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
---------Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order. Duration for Check-In/Check-Out is prolonged twice during DI.
This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules.
Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

Table of Contents

Overview

This document contains the results of testing Data Import Splitting Feature for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-644

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-645

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-647

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-646

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-671

Splitting feature documentation Detailed Release Notes for Data Import Splitting Feature

Summary

Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
---------Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order. Duration for Check-In/Check-Out is prolonged twice during DI.
This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules.
Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

1) One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODDATAIMP-748

Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing.

2) During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODDATAIMP-930

3) UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODDATAIMP-

924

929

Results

Test #		Profile	Splitting Feature Enabled	Results	Splitting Feature Disabled	Results	Before Splitting Feature released	Results
1	100K MARC Create	PTF - Create 2	37 min -39 min	Completed	40 min	Completed	32-33 minutes	Completed
1	250K MARC Create	PTF - Create 2	1 hour 32 min	Completed	1 hour 41 min	Completed	1 hour 33 min - 1 hour 57 min	Completed
1	500K MARC Create	PTF - Create 2	3 hours 29 min	Completed*	3 hours 55 min	Completed	3 hours 33 min	Completed
2	Multitenant MARC Create (100k, 50k, and 1 record)	PTF - Create 2	2 hours 40 min	Completed*			3 hours 1 min	Completed
3	CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants)	PTF - Create 2					24 min	Completed *
4	100K MARC Update (Create new file)	PTF - Updates Success - 1	58 min 25 sec 57 min 19 sec	Completed	1 hour 3 min	Completed	-	-
4	250K MARC Update	PTF - Updates Success - 1	2 hours 2 min ** 2 hours 12 min	Completed with errors ** Completed	1 hour 53 min	Completed	-	-
4	500K MARC Update	PTF - Updates Success - 1	4 hours 43 min 4 hours 38 minutes	Completed Completed	5 hour 59 min	Completed	-	-

...

** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODDATAIMP-930

...

tenant0_mod_source_record_storage.marc_records_lb = 9674629
tenant2_mod_source_record_storage.marc_records_lb = 0
tenant3_mod_source_record_storage.marc_records_lb = 0
tenant0_mod_source_record_storage.raw_records_lb = 9604805
tenant2_mod_source_record_storage.raw_records_lb = 0
tenant3_mod_source_record_storage.raw_records_lb = 0
tenant0_mod_source_record_storage.records_lb = 9674677
tenant2_mod_source_record_storage.records_lb = 0
tenant3_mod_source_record_storage.records_lb = 0
tenant0_mod_source_record_storage.marc_indexers = 620042011
tenant2_mod_source_record_storage.marc_indexers = 0
tenant3_mod_source_record_storage.marc_indexers = 0
tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
tenant0_mod_inventory_storage.authority = 4
tenant2_mod_inventory_storage.authority = 0
tenant3_mod_inventory_storage.authority = 0
tenant0_mod_inventory_storage.holdings_record = 9592559
tenant2_mod_inventory_storage.holdings_record = 16
tenant3_mod_inventory_storage.holdings_record = 16
tenant0_mod_inventory_storage.instance = 9976519
tenant2_mod_inventory_storage.instance = 32
tenant3_mod_inventory_storage.instance = 32
tenant0_mod_inventory_storage.item = 10787893
tenant2_mod_inventory_storage.item = 19
tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3

10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections
R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731
MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Kafka topics partitioning: - 2 partitions for DI topics

...

Version	Old Version 40	New Version 41
Changes made by	Olga Kondratenko	Olga Kondratenko
Saved on	Sep 27, 2023	Sep 27, 2023

Versions Compared

Key

Overview

Summary

Recommendations and Jiras

Overview

Summary

Recommendations and Jiras

Results

Name	API Name	Memory GIB	vCPUs	max_connections
R6G Extra Large	db.r6g.xlarge	32 GiB	4 vCPUs	2731

Page Comparison

Versions Compared

Key

Overview

Summary

Recommendations and Jiras

Overview

Summary

Recommendations and Jiras

Results