PTF - Data Export Test Report (Quesnelia - Eureka)
Overview
- This document contains the results of testing Data Export (MARC BIB) on the Eureka release of Quesnelia FOLIO. The goal is to compare the performances of Data Export on Eureka and non-Eureka environments and to highlight any observable differences in the KPIs.
- PERF-866Getting issue details... STATUS
Summary
- Data Export tests finished successfully on the Eureka environment using the "Default instances export job profile" and "srs - holdings and items job profile." Data Export tests were run on Central tenants.
- When comparing QECP1 Eureka with Quesnelia ECS, NON-ECS environments:
- Data Export duration for the "Default instances export job profile" was almost the same across environments.
- Data Export duration for the "srs - holdings and items job profile" showed a significant degradation: around 2.5 times longer for the 1k and 100k files, and around 5 times longer for the 500k file, and no clear reasons were identified.
- During Tests №1 and №2, we noticed some background processes for Database, so we deactivated
mod-fqm-manager
. As a result, the repeated test showed a twofold improvement in performance. - Data Export jobs were getting stuck and returned a [401 Unauthorized] error with the message:
"errors [{"type":"UnauthorizedException","code":"authorization_error","message":"Unauthorized"}]
. To resolve this, we increased the token lifespan by modifying theKC_CONFIG_TTL
parameter formod-login-keycloak
from 360s to 3600s andKC_ADMIN_TOKEN_TTL
from 410s to 4100s:{"name": "KC_CONFIG_TTL","value": "3600s"}
{"name": "KC_ADMIN_TOKEN_TTL", "value": "4100s"}
Test Results
This table contains durations for Data Export with 2 job profiles.
Profile | CSV File | Central Tenant (fs09000000) | ||
---|---|---|---|---|
Result Test Set 1 | Result Test Set 2 | Status | ||
DE MARC Bib (Default instances export job profile) | 1k.csv | 0:00:07 | 0:00:03 | COMPLETED |
100k.csv | 0:06:03 | 0:02:17 | COMPLETED | |
500k.csv | 0:09:03 | 0:04:25 | COMPLETED | |
DE MARC Bib (srs - holdings and items) | 1k.csv | 0:00:13 | 0:00:10 | COMPLETED |
100k.csv | 0:20:25 | 0:12:41 | COMPLETED | |
500k.csv | 0:38:21 | 0:38:53 | COMPLETED |
Comparison
This table contains durations comparison between Quesnelia ECS, NON-ECS and Eureka environments.
Profile | Number of records | Quesnelia NON-ECS QCP1 | Quesnelia ECS QCON | Quesnelia Eureka QECP1 | DE Duration, DELTA QCP1/QECP1 |
h:mm:ss | h:mm:ss | h:mm:ss | h:mm:ss / percent | ||
DE MARC Bib (Default instances export job profile) | 1k | 0:00:02 | 0:00:05 | 0:00:03 | +0:00:01 |
100k | 0:02:17 | 0:04:24 | 0:02:17 | 0:00:00 | |
500k | 0:05:10 | 0:06:17 | 0:04:25 | -0:00:45 | |
DE MARC Bib (srs - holdings and items) | 1k | 0:00:04 | 0:00:05 | 0:00:10 | +0:00:06 |
100k | 0:05:13 | 0:05:58 | 0:12:41 | +0:07:28 | |
500k | 0:08:58 | 0:08:28 | 0:38:53 | +0:29:55 |
Test №1 - №2
Introduction: The Baseline QECP1 Environment configuration was applied, and CPU=0 was set for all modules.
Objective: The objective of these tests was to collect performance measurements for the data-export process across central tenant.
Results: During the test, we observed that the database was running a background process for Database, so we deactivated mod-fqm-manager
. As a result, the repeated test showed a twofold improvement in performance.
Service CPU Utilization
Here we can see that mod-data-export used 28% CPU Instance Power in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU spike was 68%.
DB Connections
DB connections was 830.
Kafka metrics
OpenSearch Data Nodes metrics
DB load
Top SQL-queries
Test №3 - №4
Introduction: The Baseline QECP1 Environment configuration was applied, and CPU=0 was set for all modules.
Objective: The objective of these tests has to repeat previous tests after deactivated mod-fqm-manager.
Results: Results were collect for central tenant without any background processes for Database.
Instance CPU Utilization
Service CPU Utilization
Here we can see that mod-data-export used 28% CPU Instance Power in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU spike was 22%.
DB Connections
DB connections was 850.
Kafka metrics
OpenSearch Data Nodes metrics
DB load
Top SQL-queries
Appendix
Infrastructure
PTF - environment Quesnelia [Eureka] (qecp1)
11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
1 instance of db.r6.xlarge database instance: Writer instance
OpenSearch
domain: ptf-test
Number of nodes: 7
Version: OpenSearch_2_13_R20240520-P5
MSK - fse-tenant
4 kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Methodology/Approach
Data Export tests scenario using the profiles Default instances export job profile and srs - holdings and items were started from UI on Quesnelia (qecp1) Eureka environment.
Test set:
- Test 1: Manually tested 1k, 100k and 500k records files Data Export started on Main tenant(fs09000000) using Default instances export job profile.
- Test 2: Manually tested 1k, 100k and 500k records files Data Export started on Main tenant(fs09000000) using srs - holdings and items job profile.
- Test 3: Set Instance Count=0 for mod-fqm-manager. Manually tested 1k, 100k and 500k records files Data Export started on Main tenant(fs09000000) using Default instances export job profile.
- Test 4: Set Instance Count=0 for mod-fqm-manager. Manually tested 1k, 100k and 500k records files Data Export started on Main tenant(fs09000000) using srs - holdings and items job profile.
To get status and time range for export jobs the query used:
select jsonb->'exportedFiles'->0->>'fileName' as fileName, job_profile_name,exported, started_date,completed_date, completed_date - started_date as duration ,status from fs09000000_mod_data_export.job_executions where started_date > '2024-10-22' order by started_date desc limit 10;