Data Import test report QCON (Quesnelia)[ECS]
Overview
This document contains the results of testing Data Import for MARC Bibliographic records at Quesnelia release [ECS].
Ticket: https://folio-org.atlassian.net/browse/PERF-858 on QCON environment.
Summary
All Data-imports jobs finished successfully without errors.
The PTF - Updates Success - 2 profile(based on qcp1: PTF - Updates Success - 6 ) was created for the QCON Quesnelai release on tenant: cs00000int_0001.
DI duration growth correlates to the number of records imported.
No memory leak is suspected for DI modules.
Approximate DB CPU usage is close to 95% and this number goes for all jobs with files of more than 10k records.
Comparison with previous testing results Data Import test report (Quesnelia)[non-ECS]
Duration for Data-import create better for for files with smaller size and the same for file with 500k records.
Duration for Data-import update better for for files with smaller size and slower on 20% for files with 100k and 500k records.
Services CPU utilization, Service memory utilization, and DB CPU utilization have the same utilization trend and values as in the Poppy release.
Results
Test # | Data-import test | Duration Poppy | Duration Quesnelia (qcp1) | Duration Quesnelia (qcon) | Difference, % | Results | |
---|---|---|---|---|---|---|---|
1. | 1k MARC BIB Create | PTF - Create 2 | 39 sec | 54 sec | 31 sec | -42% | Completed |
2. | 5k MARC BIB Create | PTF - Create 2 | 2 min 22 sec | 3 min 20 sec |
|
| Not tested |
3. | 10k MARC BIB Create | PTF - Create 2 | 4 min 29 sec | 6 minutes | 4 min 14 sec | -29% | Completed |
4. | 25k MARC BIB Create | PTF - Create 2 | 10 min 38 sec | 13 min 41 sec | 9 min 41 sec | -29% | Completed |
5. | 50k MARC BIB Create | PTF - Create 2 | 20 min 26 sec | 21 min 59 sec | 18 min 18 sec | -16% | Completed |
6. | 100k MARC BIB Create | PTF - Create 2 | 2 hours 46 min Cancelled | 40 min 16 sec | 38 min 36 sec | -4% | Completed |
7. | 500k MARC BIB Create | PTF - Create 2 | Not Tested | 3 hours 27 min | 3 hours 30 min | +1.84% | Completed |
8. | 1k MARC BIB Update | PTF - Updates Success - 6 | 34 sec (PTF - Updates Success - 1) | 1 min 59 sec | 44 sec | -63% | Completed |
9 | 2k MARC BIB Update | PTF - Updates Success - 6 | 1 min 09 sec (PTF - Updates Success - 1) | 2 min 43 sec |
|
| Not tested |
10 | 5k MARC BIB Update | PTF - Updates Success - 6 | 2 min 31 sec (PTF - Updates Success - 1) | 7 min 10 sec |
|
| Not tested |
11 | 10k MARC BIB Update | PTF - Updates Success - 6 | 5 min 13 sec (PTF - Updates Success - 1) | 10 min 27 sec | 5 min 59 sec | -42% | Completed |
12 | 25k MARC BIB Update | PTF - Updates Success - 6 | 12 min 27 sec (PTF - Updates Success - 1) | 23 min 16 sec | 19 min 52 sec | -14% | Completed |
13 | 50k MARC BIB Update | PTF - Updates Success - 6 | Not tested | 40 min 52 sec | 37 min 53 sec | -7% | Completed |
14 | 100k MARC BIB Update | PTF - Updates Success - 6 | Not tested | 1 hrs 2 min | 1 hrs 14 min | +19% | Completed |
15 | 500k MARC BIB Update | PTF - Updates Success - 6 | Not tested | 5 hrs 31 min | 6 hrs 39 min | +21% | Completed |
Service CPU Utilization
MARC BIB CREATE
Tests #1-7
1k, 10k, 25k, 50k, 100k, 500k records
CPU utilization for all modules returned by default numbers after all tests. Average for mod-inventory-b - 90%, mod-inventory-storage-b - 27%, mod-source-record-storage-b - 15%, mod-source-record-manager-b - 14%, mod-di-converter-storage-b - 51%, , mod-data-import - 350% spike for 500k job(same behavior on Poppy version).
MARC BIB UPDATE
Tests #8-15
1k, 10k, 25k, 50k, 100k, 500k records
Memory Utilization
No memory leak is suspected for DI modules.
MARC BIB CREATE
Tests #1-7
1k, 10k, 25k, 50k, 100k, 500k records
MARC BIB UPDATE
Tests #8-15
1k, 10k, 25k, 50k, 100k, 500k records
RDS CPU Utilization
MARC BIB CREATE
Average 90% for DI jobs with more than 10k records for Create and Update profiles
MARC BIB UPDATE
RDS Database Connections
MARC BIB CREATE
DB connections was 1400 in average
MARC BIB Update
DB connections was 1400 in average
Average active sessions (AAS)
MARC BIB CREATE
Top SQL
MARC BIB UPDATE
Top SQL
OpenSearch Service
Cluster status was green during the tests
Master nodes
1. CPU utilization MasterCPUUtilization
https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2?graph=~(metrics~(~(~'AWS2fES~'MasterCPUUtilization~'DomainName~'fse~'ClientId~'054267740449))~view~'timeSeries~stacked~false~region~'us-east-1~title~'CPU20utilization2028Percent*29~period~60~stat~'Maximum~yAxis~(left~(showUnits~false)))
MARC BIB Create
CPU utilization was 20% in average
MARC BIB Update
CPU utilization was 20% in average
Data nodes
1. CPU utilization. CPUUtilization
https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2?graph=~(metrics~(~(~'AWS2fES~'CPUUtilization~'DomainName~'fse~'ClientId~'054267740449))~view~'timeSeries~stacked~false~region~'us-east-1~title~'CPU20utilization2028Percent*29~period~60~stat~'Maximum~yAxis~(left~(showUnits~false)))
2. Maximum memory utilization (SysMemoryUtilization)
https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2?graph=~(metrics~(~(~'AWS2fES~'SysMemoryUtilization~'DomainName~'fse~'ClientId~'054267740449))~view~'timeSeries~stacked~false~region~'us-east-1~title~'Maximum20memory20utilization2028Percent29~period~60~stat~'Maximum~yAxis~(left~(showUnits~false)))
MARC BIB Create
CPU utilization was 99% in average
Maximum memory utilization was 92% in average
MARC BIB Update
CPU utilization was 99% in average
Maximum memory utilization was 94% in average
Managed Streaming for Apache Kafka
CPU (User) usage by broker
MARC BIB Create
MARC BIB Update
Appendix
Infrastructure
11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
1 instance of db.r6.xlarge database instance: Writer instance
OpenSearch
domain: fse
Number of nodes: 9
Version: OpenSearch_2_7_R20240502
MSK - tenat
4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Kafka consolidated topics enabled
Methodology
Pregenerated files were used for DI Create job profile
1K, 10K, 25K, 50K, 100K and 500K files.
Run DI Create on a single tenant(cs00000int_0001) one by one with the delay with files using PTF - Create 2 profile.
Prepare files for DI Update with the Data export app, using previously imported items
Run DI Update on a single tenant(cs00000int_0001) one by one with the delay with prepared files using PTF - Update Success 2 profile
1K, 10K, 25K, 50K, 100K and 500K files.
Data-import durations were obtained from DB using SQL query
select file_name,started_date,completed_date, completed_date - started_date as duration ,status
from cs00000int_0001_mod_source_record_manager.job_execution order by started_date desc limit 1000;