Overview
This document contains the results of testing Data Import for MARC Bibliographic records at Quesnelia release [non-ECS]. https://folio-org.atlassian.net/jira/people/712020:c7153665-e98d-4df6-a9f4-fe368ae2480f/boards/224?selectedIssue=PERF-836
Summary
Recommendations and Jiras
Results
Test # | Data-import test | Duration Poppy with R/W split enabled | Duration Quesnelia with R/W split enabled | Difference, % / sec | Results | |
---|---|---|---|---|---|---|
1. | 1k MARC BIB Create | PTF - Create 2 | 39 sec | 54 sec | ↓ 15 sec | Completed |
2. | 5k MARC BIB Create | PTF - Create 2 | 2 min 22 sec | 3 min 20 sec | ↓ 1 min 8 sec | Completed |
3. | 10k MARC BIB Create | PTF - Create 2 | 4 min 29 sec | 6 minutes | ↓ 1 min 31 sec | Completed |
4. | 25k MARC BIB Create | PTF - Create 2 | 10 min 38 sec | 13 min 41 sec | ↓ 3 min 3 sec | Completed |
5. | 50k MARC BIB Create | PTF - Create 2 | 20 min 26 sec | 21 min 59 sec | ↓ 1 min 33 sec | Completed |
6. | 100k MARC BIB Create | PTF - Create 2 | 2 hours 46 min Cancelled | 40 min 16 sec | Completed | |
7. | 500k MARC BIB Create | PTF - Create 2 | Not tested | 3 hours 27 min | Completed | |
8. | 1k MARC BIB Update | PTF - Updates Success - 1 | 34 sec | |||
9 | 2k MARC BIB Update | PTF - Updates Success - 1 | 1 min 09 sec | |||
10 | 5k MARC BIB Update | PTF - Updates Success - 1 | 2 min 31 sec | ↓ 6.66% / 17 sec | ||
11 | 10k MARC BIB Update | PTF - Updates Success - 1 | 5 min 13 sec | ↓ 1.84% / 10 sec | ||
12 | 25k MARC BIB Update | PTF - Updates Success - 1 | 12 min 27 sec | ↓ 14% / 105 sec | ||
13 | 25k MARC BIB Update | PTF - Updates Success - 1 | 2 min 15 sec | |||
14 | 25k MARC BIB Update | PTF - Updates Success - 1 | 12 min |
Test Runs
MARC BIB CREATE
Tests #1-7 1k, 5k, 10k, 25k, 50k, 100k, 500k records
Data-import | start time | end time | |
---|---|---|---|
1 | 500k_bib_Create.mrc | 2024-04-01 09:56:59.095+00 | 2024-04-01 13:26:19.429+00 |
2 | 100k_bib_Create.mrc | 2024-04-01 09:03:56.04+00 | 2024-04-01 09:44:12.654+00 |
3 | 50k_bib_Create.mrc | 2024-04-01 08:18:58.078+00 | 2024-04-01 08:40:56.215+00 |
4 | 25k_bib_Create.mrc | 2024-04-01 07:58:48.679+00 | 2024-04-01 08:12:30.555+00 |
5 | 10k_bib_Create.mrc | 2024-04-01 07:47:09.388+00 | 2024-04-01 07:53:08.405+00 |
6 | 5k_bib_Create.mrc | 2024-04-01 07:40:32.282+00 | 2024-04-01 07:43:52.674+00 |
7 | 1k_bib_Create.mrc | 2024-04-01 07:38:30.511+00 | 2024-04-01 07:39:24.804+00 |
Service CPU Utilization
MARC BIB CREATE
Tests #1-7
1k, 5k, 10k, 25k, 50k, 100k, 500k records
CPU utilization for all modules came back to by default numbers after all tests. Average for mod-inventory-b - 130%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 40%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 70%, , mod-data-import - 350% spike for 500k job(same behaviour on Poppy version).
MARC BIB UPDATE
Tests #8-14
1k, 2k, 5k, 10k, 25k, 25k, 25k records
Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.
Memory Utilization
No memory leak is suspected for DI modules.
MARC BIB CREATE
Tests #1-7
1k, 5k, 10k, 25k, 50k, 100k, 500k records
MARC BIB UPDATE
Tests #8-14
1k, 5k, 10k, 25k, 25k, 25k records
RDS CPU Utilization
MARC BIB CREATE
Average 95% for DI jobs with more than 10k records
MARC BIB UPDATE
RDS Database Connections
MARC BIB CREATE
For DI job Create maximum 275 and for Update - 260 connections
Average active sessions (AAS)
MARC BIB CREATE
Top SQL
MARC BIB UPDATE
Top SQL
INSERT INTO fs09000000_mod_source_record_manager.events_processed
INSERT INTO fs09000000_mod_source_record_manager.journal_records
MSK CPU utilization (Percent) OpenSearch
Avarage CPU Utilization is about 9%
CPU (User) usage by broker
Errors
Appendix
Infrastructure
PTF -environment pcp1
10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, writer/reader
NameMemory GIBvCPUsmax_connections
db.r6g.xlarge
32 GiB4 vCPUs2731
MSK tenant
4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Methodology
Prepare files for DI Create job
1K, 2K, 5K, 10K, 25K, 50K, 100K files.
Run DI Create on a single tenant one by one with delay with files using PTF - Create 2 profile.
Prepare files for DI Update with Data export app
Run DI Update on a single tenant one by one with delay with prepared files using PTF - Update Success 1 profile
SELECT (completed_date-started_date) as duration, *
FROM fs09000000_mod_source_record_manager.job_execution
where subordination_type = 'COMPOSITE_PARENT' order by started_date desc limit 10