Overview
- This document contains the results of testing Data Export (MARC BIB) on the Quesnelia [ECS] release on qcon environment.
- PERF-844Getting issue details... STATUS
Summary
- Data import tests finished successfully, only Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files. Duration of DI grew in correspondence with the number of records in files.
- Check-in and Check-out with 5 virtual users was performed during DI Create new MARC authority records jobs for non-matches No issues.
- Data Import in Quesnelia without CICO perform faster than with it.
- Comparing Poppy and Quesnelia releases
- Check-in / Check-out perform better in Quesnelia. Response time improved during Create jobs for long period of work time on 15% in Average.
- DI durations improved - 11%-14% in Average.
- During testing, we noticed spikes in the mod permissions module. To mitigate this issue and prevent system slowdowns, we adjusted the order of loading files, starting with Tenant 3 (qcp1-02), followed by Tenant 2 (qcp1-01), and finally Tenant 1 (qcp1-00).
Test Results
This table contains durations for jobs with 2 job profiles.
Profile | CSV File | Tenant College (cs00000int_0001) | Central Tenant (cs00000int) | ||
---|---|---|---|---|---|
Result | Status | Result | Status | ||
DE MARC Bib (Default instances export job profile) | 1k.csv | 0:00:02 | COMPLETED | 0:00:05 | COMPLETED |
100k.csv | 0:02:39 | COMPLETED | 0:04:24 | COMPLETED | |
500k.csv | 0:05:21 | COMPLETED | 0:06:17 | COMPLETED | |
DE MARC Bib (srs - holdings and items) | 1k.csv | 0:00:05 | COMPLETED | 0:00:05 | COMPLETED |
100k.csv | 0:08:15 | COMPLETED | 0:05:58 | COMPLETED | |
500k.csv | 0:09:22 | COMPLETED | 0:08:28 | COMPLETED |
Comparison
Test №1
Test with 1k, 10k, 25k and 50k records files DI started on one tenant only(qcp1-00), and comparative results between Poppy and Quesnelia.
# of records | % creates | File | DI duration | DI duration | DI duration | DI duration | DI duration Quesnelia |
---|---|---|---|---|---|---|---|
1,000 | 100 | 1k_marc_authority.mrc?api=v2 | 24 s | 27 s | 41 sec | 29 sec | 22 sec -24% |
5,000 | 100 | LC_SUBJ_msplit00000000.mrc?api=v2 | 1 min 21 s | 1 min 15 s | 1min 21s | 1 min 38 sec | 1 min 19 sec -19% |
10,000 | 100 | msplit00000000.mrc?api=v2 | 2 min 32 s | 2 min 31 s | 2min 53s | 2 min 53 sec | 2 min 36 sec -9.8% |
22778 (for Poppy test) 25000 (for Quesnelia test) | 100 | msplit00000013.mrc?api=v2 | 11 min 14 s | 7 min 7 s | 5 min 42s | 6 min 24 sec | 6 min 19 sec -1.3% |
50,000 | 100 | 50000_authorityrecords.mrc?api=v2 | 22 min | 11 min 24 s | 11 min 11s | 13 min 48 sec | 11 min 59 sec -13% |
Test №2
Test with CICO 5 concurrent users and DI 1K, 5K, 10K, 25K and 50K started on one tenant only.
- Сomparative Baseline Check-In\Check-Out results without Data Import between Poppy and Quesnelia.
CICO, Median time without | CICO, 95% time without DI (Poppy) | CICO, Median time without DI (Quesnelia) | CICO, 95% time without DI (Quesnelia) | CICO, Avg time without DI (Quesnelia) | |
---|---|---|---|---|---|
Check-In | 516 ms | 567 ms | 503 ms -2.5% | 593 ms | 511 ms |
Check-Out | 910 ms | 2094 ms | 836 ms -8% | 1117 ms -46% | 876 ms |
- Сomparative Check-In\Check-Out results between Baseline (Quesnelia) and Check-In\Check-Out plus Data Import (Quesnelia.)
# of records (Quesnelia) | DI Duration with CICO | CI time Avg (Quesnelia) | CI time 95th pct (Quesnelia) | CO time Avg (Quesnelia) | CO time 95th pct (Quesnelia) | Baseline CI Avg delta | Baseline CI 95th pct delta | Baseline CO Avg delta | Baseline CO 95th pct delta |
---|---|---|---|---|---|---|---|---|---|
1,000 | 20 sec | 0.560 | 0.754 | 1.164 | 1.313 | +9% | +27% | +32% | +17% |
5,000 | 1 min 19 sec | 0.701 | 1.171 | 1.141 | 1.790 | +37% | +97% | +30% | +60% |
10,000 | 2 min 35 se | 0.723 | 1.024 | 1.179 | 1.494 | +41% | +72% | +34% | +34% |
25,000 | 6 min 26 sec | 0.722 | 1.024 | 1.180 | 1.494 | +41% | +72% | +35% | +34% |
50,000 | 12 min 16 sec | 0.777 | 1.045 | 1.265 | 1.550 | +52% | +76% | +44% | +39% |
- Сomparative Data Import and Check-In\Check-Out results between Poppy and Quesnelia.
# of records | DI Duration with CICO | CI time Avg | CI time 95th pct | CO time Avg | CO time 95th pct | # of records (Quesnelia) | DI Duration with CICO | CI time Avg (Quesnelia) | CI time 95th pct (Quesnelia) | CO time Avg (Quesnelia) | CO time 95th pct (Quesnelia) |
---|---|---|---|---|---|---|---|---|---|---|---|
1,000 | 35 sec | 0.525 | 0.576 | 1.078 | 1.326 | 1,000 | 20 sec | 0.560 +6% | 0.754 +30% | 1.164 +8% | 1.313 -1% |
5,000 | 1 min 41 sec | 0.513 | 0.612 | 0.9 | 1.019 | 5,000 | 1 min 19 sec -21.7% | 0.701 +36% | 1.171 +91% | 1.141 +26% | 1.790 +75% |
10,000 | 3 min 4 sec | 0.581 | 0.685 | 1.016 | 1.321 | 10,000 | 2 min 35 sec -15.7% | 0.723 +24% | 1.024 +49% | 1.179 +16% | 1.494 +13% |
22,778 | 6 min 32 sec | 0.598 | 1.542 | 1.244 | 1.729 | 25,000 | 6 min 26 sec -1.5% | 0.722 +20% | 1.024 -33% | 1.180 -5% | 1.494 -13% |
50,000 | 13 min 48 sec | 0.671 | 1.953 | 1.51 | 2.09 | 50,000 | 12 min 16 sec | 0.777 +15% | 1.045 -46% | 1.265 | 1.550 -25% |
Resource utilization for Test #1 and Test #2
Service CPU Utilization
Here we can see that mod-data-export used 452% CPU in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU spike was 32%.
DB Connections
DB connections was 1470.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
Resource utilization for Test #3 and Test #4
Service CPU Utilization
Here we can see that mod-data-export used 336% CPU in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU was 35%.
DB Connections
DB connections was 1377.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
Appendix
Infrastructure
PTF - environment Quesnelia (qcon)
11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
1 instance of db.r6.xlarge database instance: Writer instance
OpenSearch
domain: fse
Number of nodes: 9
Version: OpenSearch_2_7_R20240502
MSK - tenat
4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Kafka consolidated topics enabled
Additional links and Errors
Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files.
- 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR Api Access for user 'folio' (9eb67301-6f6e-468f-9b1a-6134dc39a684) requires permission: metadata-provider.incomingrecords.get
- 09:55:16 [815600/metadata-provider] [fs07000001] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] ERROR PostgresClient queryAndAnalyze: ERROR: invalid input syntax for type uuid: "undefined" (22P02) - SELECT * FROM get_record_processing_log('3e63f944-40ea-477c-ac21-79bb24780bc5', 'undefined')
- 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR FilterApi Permission missing in []
Also we used different order for Tenants when load files, we decided started load files from Tenant 3(qcp1-02) → Tenant 2(qcp1-01) → Tenant 1(qcp1-00) to avoid problem when mod-permissions spiked and system stacked.
CPU Utilization when mod-permissions spiked and system stacked.
CPU Utilization when mod-permissions spiked and system stacked.
Methodology/Approach
Data Export tests scenario using the profiles Default instances export job profile and srs - holdings and items were started from UI on Quesnelia (qcon) ecs environment.
Test set
- Test 1: Manually tested 1k, 100k and 500k records files Data Export started on one tenant(cs00000int_0001) only using Default instances export job profile.
- Test 2: Manually tested 1k, 100k and 500k records files Data Export started on one tenant(cs00000int_0001) only using srs - holdings and items job profile.
- Test 3: Manually tested 1k, 100k and 500k records files Data Export started on central tenant(cs00000int) only using Default instances export job profile.
- Test 4: Manually tested 1k, 100k and 500k records files Data Export started on central tenant(cs00000int) only using srs - holdings and items job profile.
To get status and time range for export jobs the query used:
SELECT jsonb->>'status' AS status, to_timestamp((jsonb->>'startedDate')::bigint / 1000) AS startedDate, to_timestamp((jsonb->>'completedDate')::bigint / 1000) AS completedDate, exported_file->>'fileName' AS fileName, jsonb->>'jobProfileName' AS jobProfileName, (jsonb->>'completedDate')::bigint - (jsonb->>'startedDate')::bigint AS duration_ms, to_char( (to_timestamp((jsonb->>'completedDate')::bigint / 1000) - to_timestamp((jsonb->>'startedDate')::bigint / 1000))::interval, 'HH24:MI:SS' ) AS duration_hhmmss FROM cs00000int_0001_mod_data_export.job_executions, jsonb_array_elements(jsonb->'exportedFiles') AS exported_file WHERE -- (jsonb->>'hrId')::int IN (309, 310, 311, 312, 313, 314) -- Central tenant (jsonb->>'hrId')::int IN (266, 267, 268, 269, 270, 271) ORDER BY jsonb->>'startedDate' DESC LIMIT 10;