Table of Contents outline true
Overview
- This document contains the results of testing Data Import for MARC Bibliographic records with an update job in the Quesnelia release on qcp1 environments with Kafka consolidated topics and file splitting features enabled on a non-ecs environment.
Jira Legacy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Summary
Test Results and Comparison
Test №1
Test with 1k, 10k, 25k and 50k records files DI started on one tenant only.
...
% creates
...
File
...
DI duration
Morning Glory
...
DI duration
Nolana
...
DI duration
Orchid
...
DI duration
Poppy
...
Test №2
Test with CICO 5 concurrent users and DI 1K, 5K, 10K, 25K and 50K started on one tenant only.
- Сomparative Data Import and Check-In\Check-Out results between Baseline and Quesnelia.
...
# of records
...
DI Duration
with CICO
...
CI time Avg
without
...
Baseline CI delta
...
CI time 95th pct
...
Baseline CI delta
...
CO time Avg
...
Baseline CO Avg
Delta
...
CO time 95th pct
...
Baseline CO delta
...
Table of Contents outline true
Overview
- This document contains the results of testing Data Import for MARC Bibliographic records with an update job in the Quesnelia release on qcp1 environments with Kafka consolidated topics and file splitting features enabled on a non-ecs environment.
Jira Legacy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Summary
- Data import tests finished successfully, only Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files. Duration of DI grew in correspondence with the number of records in files.
- Check-in and Check-out with 5 virtual users was performed during DI Create new MARC authority records jobs for non-matches No issues.
- Data Import in Quesnelia without CICO perform faster than with it.
- Comparing Poppy and Quesnelia releases
- Check-in / Check-out perform better in Quesnelia. Response time improved during Create jobs for long period of work time on 15% in Average.
- DI durations improved - 11%-14% in Average.
- During testing, we noticed spikes in the mod permissions module. To mitigate this issue and prevent system slowdowns, we adjusted the order of loading files, starting with Tenant 3 (qcp1-02), followed by Tenant 2 (qcp1-01), and finally Tenant 1 (qcp1-00).
Test Results and Comparison
Test №1
Test with 1k, 10k, 25k and 50k records files DI started on one tenant only(qcp1-00), and comparative results between Poppy and Quesnelia.
# of records | % creates | File | DI duration | DI duration | DI duration | DI duration | DI duration Quesnelia |
---|---|---|---|---|---|---|---|
1,000 | 100 | 1k_marc_authority.mrc?api=v2 | 24 s | 27 s | 41 sec | 29 sec | 22 sec -24% |
5,000 | 100 | LC_SUBJ_msplit00000000.mrc?api=v2 | 1 min 21 s | 1 min 15 s | 1min 21s | 1 min 38 sec | 1 min 19 sec -19% |
10,000 | 100 | msplit00000000.mrc?api=v2 | 2 min 32 s | 2 min 31 s | 2min 53s | 2 min 53 sec | 2 min 36 sec -9.8% |
22778 (for Poppy test) 25000 (for Quesnelia test) | 100 | msplit00000013.mrc?api=v2 | 11 min 14 s | 7 min 7 s | 5 min 42s | 6 min 24 sec | 6 min 19 sec -1.3% |
50,000 | 100 | 50000_authorityrecords.mrc?api=v2 | 22 min | 11 min 24 s | 11 min 11s | 13 min 48 sec | 11 min 59 sec -13% |
Test №2
Test with CICO 5 concurrent users and DI 1K, 5K, 10K, 25K and 50K started on one tenant only.
- Сomparative Baseline Check-In\Check-Out results without Data Import between Poppy and Quesnelia.
CICO, Median time without | CICO, 95% time without DI (Poppy) | CICO, Median time without DI (Quesnelia) | CICO, 95% time without DI (Quesnelia) | CICO, Avg time without DI (Quesnelia) | |
---|---|---|---|---|---|
Check-In | 516 ms | 567 ms | 503 ms -2.5% | 593 ms | 511 ms |
Check-Out | 910 ms | 2094 ms | 836 ms -8% | 1117 ms -46% | 876 ms |
- Сomparative Check-In\Check-Out results between Baseline (Quesnelia) and Check-In\Check-Out plus Data Import (Quesnelia.)
# of records (Quesnelia) | DI Duration with CICO | CI time Avg (Quesnelia) | CI time 95th pct (Quesnelia) | CO time Avg (Quesnelia) | CO time 95th pct (Quesnelia) | Baseline CI Avg delta | Baseline CI 95th pct delta | Baseline CO Avg delta | Baseline |
---|
CO 95th pct delta | |||||||||
---|---|---|---|---|---|---|---|---|---|
1,000 | 20 sec | 0.560 | 0.754 | 1.164 | 1.313 | +9% | +27% | +32% | +17% |
5,000 | 1 min 19 sec | 0.701 | 1.171 | 1.141 | 1.790 | +37% | +97% | +30% | +60% |
10,000 | 2 min 35 se | 0.723 | 1.024 | 1.179 | 1.494 | +41% | +72% | +34% | +34% |
25,000 | 6 min 26 sec | 0.722 | 1.024 | 1.180 | 1.494 | +41% | +72% | +35% | +34% |
50,000 | 12 min 16 sec | 0.777 | 1.045 | 1.265 | 1.550 | +52% | +76% | +44% | +39% |
- Сomparative Data Import and Check-In\Check-Out results between Poppy and Quesnelia.
# of records | DI Duration with CICO | CI time Avg | CI time 95th pct | CO time Avg | CO time 95th pct | # of records (Quesnelia) | DI Duration with CICO | CI time Avg (Quesnelia) | CI time 95th pct (Quesnelia) | CO time Avg (Quesnelia) | CO time 95th pct (Quesnelia) |
---|---|---|---|---|---|---|---|---|---|---|---|
1,000 | 35 sec | 0.525 | 0.576 | 1.078 | 1.326 | 1,000 | 20 sec | 0.560 +6% | 0.754 +30% | 1.164 +8% | 1.313 -1% |
5,000 | 1 min 41 sec | 0.513 | 0.612 | 0.9 | 1.019 | 5,000 | 1 min 19 sec -21.7% | 0.701 +36% | 1.171 +91% | 1.141 +26% | 1.790 +75% |
10,000 | 3 min 4 sec | 0.581 | 0.685 | 1.016 | 1.321 | 10,000 | 2 min 35 sec -15.7% | 0.723 +24% | 1.024 +49% | 1.179 +16% | 1.494 +13% |
22,778 | 6 min 32 sec | 0.598 | 1.542 | 1.244 | 1.729 | 25,000 | 6 min 26 sec -1.5% | 0.722 +20% | 1.024 -33% | 1.180 -5% | 1.494 -13% |
50,000 | 13 min 48 sec | 0.671 | 1.953 | 1.51 | 2.09 | 50,000 | 12 min 16 sec | 0.777 +15% | 1.045 -46% | 1.265 | 1.550 -25% |
Test №3
Multitenant testing
...
PTF - environment Quesnelia (qcp1)
- 10 db.r6g.xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instances, writer
Name Memory GIB vCPUs db.r6g.xlarge
32 GiB 4 vCPUs - MSK ptf-mobius-testing2
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=23
...
- 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR Api Access for user 'folio' (9eb67301-6f6e-468f-9b1a-6134dc39a684) requires permission: metadata-provider.incomingrecords.get
- 09:55:16 [815600/metadata-provider] [fs07000001] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] ERROR PostgresClient queryAndAnalyze: ERROR: invalid input syntax for type uuid: "undefined" (22P02) - SELECT * FROM get_record_processing_log('3e63f944-40ea-477c-ac21-79bb24780bc5', 'undefined')
- 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR FilterApi Permission missing in []
...
Also we used different order for Tenants when load files, we decided started load files from Tenant 3(qcp1-02) → Tenant 2(qcp1-01) → Tenant 1(qcp1-00) to avoid problem when mod-permissions spiked and system stacked.
...
CPU Utilization when mod-permissions spiked and system stacked.
Recommendations & Jiras (Optional)
Link to Jira ticket: https://folio-org.atlassian.net/browse/PERF-801
Methodology/Approach
DI tests scenario a data import job profile that creates new MARC authority records for non-matches (Job Profile: KG - Create SRS MARC Authority on nonmatches to 010 $a DUBLICATE for Q) were started from UI on Quesnelia (qcp1) env with file splitting features enabled on a non-ecs environment..
...
- The above files are all stored here - MARC Resources
- 22k file what was provided from MARC Resources does nor work, so 50k file was split to file with 25k records and used instead of 22k file. - At the time of the test run, Grafana was not available. As a result, response times for Check-In/Check-Out were parsed manually from a .jtl file, using the start and finish dates of the data import tests. These results were visualized in JMeter using a Listener (Response Times Over Time).
Test set
- Test 1: Manually tested 1k, 10k, 25k and 50k records files DI started on one tenant(qcp1-00) only.
- Test 2: Manually tested 1k, 10k, 25k and 50k records files DI started on one tenant(qcp1-00) only plus Check-in and Checkout (CICO) for 5 concurrent users.
- Test 3: Manually tested 1k, 10k, 25k and 50k records files DI started on 3 tenants concurrently. Order for load file without pause between files: 50k, 25k, 10k, 5k, and 1k for order tenants : Tenant 3(qcp1-02), Tenant 2(qcp1-01) and Tenant 1(qcp1-00)
- Test 4: Manually tested 1k, 10k, 25k and 50k records files DI started on 3 tenants concurrently. Order for load file with pause between files: 50k, 25k, 10k, 5k, and 1k for order tenants : Tenant 3(qcp1-02), Tenant 1(qcp1-00) and Tenant 2(qcp1-01)
- Test 5: Manually tested 1k, 10k, 25k and 50k records files DI started on 3 tenants concurrently. Order for load file without pause between files: 1k, 5k, 10k, 25k and 50k for order tenants : Tenant 3(qcp1-02), Tenant 2(qcp1-01) and Tenant 1(qcp1-00)
...