Table of Contents outline true
Overview
- This document contains the results of testing Data Export (MARC BIB) on the Quesnelia [ECS] release on qcon environment.
Jira Legacy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Summary
- Data import tests finished successfully, only Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files. Duration of DI grew in correspondence with the number of records in files.
- Check-in and Check-out with 5 virtual users was performed during DI Create new MARC authority records jobs for non-matches No issues.
- Data Import in Quesnelia without CICO perform faster than with it.
- Comparing Poppy and Quesnelia releases
- Check-in / Check-out perform better in Quesnelia. Response time improved during Create jobs for long period of work time on 15% in Average.
- DI durations improved - 11%-14% in Average.
- During testing, we noticed spikes in the mod permissions module. To mitigate this issue and prevent system slowdowns, we adjusted the order of loading files, starting with Tenant 3 (qcp1-02), followed by Tenant 2 (qcp1-01), and finally Tenant 1 (qcp1-00).
Test Results and Comparison
Test №1
Test with 1k, 10k, 25k and 50k records files DI started on one tenant only(qcp1-00), and comparative results between Poppy and Quesnelia.
...
% creates
...
File
...
DI duration
Morning Glory
...
DI duration
Nolana
...
DI duration
Orchid
...
DI duration
Poppy
...
Test №2
Test with CICO 5 concurrent users and DI 1K, 5K, 10K, 25K and 50K started on one tenant only.
- Сomparative Baseline Check-In\Check-Out results without Data Import between Poppy and Quesnelia.
...
CICO, Median time without
DI
(Poppy)
...
593 ms
+4.5%
...
- Сomparative Check-In\Check-Out results between Baseline (Quesnelia) and Check-In\Check-Out plus Data Import (Quesnelia.)
...
DI Duration with CICO
(Quesnelia)
...
20 sec
...
12 min 16 sec
...
1.265
...
- Сomparative Data Import and Check-In\Check-Out results between Poppy and Quesnelia.
...
# of records
(Poppy)
...
DI Duration with CICO
(Poppy)
...
CI time 95th pct
(Poppy)
...
CO time Avg
(Poppy)
...
CO time 95th pct
(Poppy)
...
DI Duration with CICO
(Quesnelia)
...
20 sec
-42.8%
...
12 min 16 sec
-11%
...
1.265
-16%
...
Resource utilization for Test #1
...
title | Resource utilization table |
---|
Service CPU Utilization
Here we can see that mod-inventory-b module used 50% CPU and mod-source-record-storage-b 46% CPU
Table of Contents outline true
Overview
- This document contains the results of testing Data Export (MARC BIB) on the Quesnelia [ECS] release on qcon environment.
Jira Legacy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Summary
- Data Export tests finished successfully on qcon environment using the profiles Default instances export job profile and srs - holdings and items job profile.
- Data Export test were run on College and Central tenants, but results for comparing between environment releases were taken from College tenant.
- Comparing with previous testing results Poppy and Quesnelia releases
- Data Export processed all files including file with 500k records without errors for Quesnelia releases.
- Data Export durations improved - 80% in Average for Quesnelia releases.
- During testing, we noticed spikes in the mod-data-export up to 593% CPU.
- For Test №5 Data Export started on College tenant(cs00000int_0001), Central tenant(cs00000int) and Professional tenant(cs00000int_0002) concurrently using the Default instances export job profile, we observed that the CPU usage of the mod-data-export module was initially at 44% before the test began then it spiked to 109% during the test and remained elevated without returning to the initial state.
Test Results
This table contains durations for Data Export with 2 job profiles.
Profile | CSV File | Tenant College (cs00000int_0001) | Central Tenant (cs00000int) | ||
---|---|---|---|---|---|
Result | Status | Result | Status | ||
DE MARC Bib (Default instances export job profile) | 1k.csv | 0:00:02 | COMPLETED | 0:00:05 | COMPLETED |
100k.csv | 0:02:39 | COMPLETED | 0:04:24 | COMPLETED | |
500k.csv | 0:05:21 | COMPLETED | 0:06:17 | COMPLETED | |
DE MARC Bib (srs - holdings and items) | 1k.csv | 0:00:05 | COMPLETED | 0:00:05 | COMPLETED |
100k.csv | 0:08:15 | COMPLETED | 0:05:58 | COMPLETED | |
500k.csv | 0:09:22 | COMPLETED | 0:08:28 | COMPLETED |
This table contains durations for Test №5 Data Export for 3 tenants concurrently.
Tenant | CSV File | Result | Status |
---|---|---|---|
Tenant College (cs00000int_0001) | 500k.csv | 0:10:24 | COMPLETED |
Tenant Professional (cs00000int_0002) | 500k.csv | 0:06:47 | COMPLETED |
Central Tenant (cs00000int) | 500k.csv | 0:07:56 | COMPLETED |
Comparison
This table contains durations comparison between Poppy and Quesnelia releases.
Profile | CSV File | DE Duration/Status Orchid | DE Duration/Status Poppy 1 set | DE Duration/Status Quesnelia Tenant College (cs00000int_0001) | DE Duration, DELTA Poppy/Quesnelia | |||
Result | Status | Result | Status | Result | Status | hh:mm:ss / percent | ||
DE MARC Bib (Default instances export job profile) | 1k.csv | 00:00:08 | COMPLETED | 0:00:02 | COMPLETED | -00:00:06 | ||
100k.csv | 00:15:36 | COMPLETED | 0:02:39 | COMPLETED | -00:12:57 | |||
500k.csv | 00:57:25 | FAIL | 0:05:21 | COMPLETED | -00:52:04 | |||
DE MARC Bib (srs - holdings and items) | 1k.csv | 00:00:27 | COMPLETED | 00:00:29 | COMPLETED | 0:00:05 | COMPLETED | -00:00:24 |
100k.csv | 00:47:51 | COMPLETED | 00:47:23 | COMPLETED | 0:08:15 | COMPLETED | -00:39:08 | |
500k.csv | 04:00:26 | COMPLETED | 04:11:09 | FAIL | 0:09:22 | COMPLETED | -04:01:47 |
Resource utilization for Test #1 and Test #2
Expand | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||
|
Service CPU Utilization
Here we can see that mod-data-export used 452% CPU in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU spike was 32%.
DB Connections
DB connections was 1470.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
Resource utilization for Test #3 and Test #4
Expand | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||
|
Service CPU Utilization
Here we can see that mod-data-export used 336% CPU in spike.
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU in the average was 90%35%.
DB Connections
DB connections was 11511377.
DB load
Top SQL-queries
# | TOP 5 SQL statements |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
| |
2 |
|
3 |
|
4 |
|
5 |
|
Resource utilization for Test #5
Expand | ||
---|---|---|
|
...
|
...
Here we can see that mod-inventory-b module and nginx-okapi used 50% CPU
Service Memory Utilization
Here we can see that all modules show a stable trend.
DB CPU Utilization
DB CPU was 93%.
DB Connections
DB connections was 1580.
DB load
Top SQL-queries
...
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
...
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
...
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *
...
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
...
WITH input_rows(record_id, authority_id) AS (
VALUES ($1::uuid,$2::uuid)
)
, ins AS (
INSERT INTO fs09000000_mod_inventory.records_authorities(record_id, authority_id)
SELECT * FROM input_rows
ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
RETURNING record_id::uuid, authority_id::uuid
)
SELECT record_id, authority_id
FROM ins
UNION ALL
SELECT c.record_id, c.authority_id
FROM input_rows
JOIN fs09000000_mod_inventory.records_authorities c USING (record_id);
Appendix
Infrastructure
PTF - environment Quesnelia (qcp1)
...
1 database instances, writer
...
db.r6g.xlarge
...
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...
title | Quesnelia modules memory and CPU parameters |
---|
Additional links and Errors
Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files.
- 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR Api Access for user 'folio' (9eb67301-6f6e-468f-9b1a-6134dc39a684) requires permission: metadata-provider.incomingrecords.get
- 09:55:16 [815600/metadata-provider] [fs07000001] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] ERROR PostgresClient queryAndAnalyze: ERROR: invalid input syntax for type uuid: "undefined" (22P02) - SELECT * FROM get_record_processing_log('3e63f944-40ea-477c-ac21-79bb24780bc5', 'undefined')
- 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR FilterApi Permission missing in []
Also we used different order for Tenants when load files, we decided started load files from Tenant 3(qcp1-02) → Tenant 2(qcp1-01) → Tenant 1(qcp1-00) to avoid problem when mod-permissions spiked and system stacked.
CPU Utilization when mod-permissions spiked and system stacked.
CPU Utilization when mod-permissions spiked and system stacked.
Recommendations & Jiras (Optional)
Link to Jira ticket: https://folio-org.atlassian.net/browse/PERF-801
Methodology/Approach
DI tests scenario a data import job profile that creates new MARC authority records for non-matches (Job Profile: KG - Create SRS MARC Authority on nonmatches to 010 $a DUBLICATE for Q) were started from UI on Quesnelia (qcp1) env with file splitting features enabled on a non-ecs environment..
Action for non-matches: Create MARC authority record
- The above files are all stored here - MARC Resources
- 22k file what was provided from MARC Resources does nor work, so 50k file was split to file with 25k records and used instead of 22k file. - At the time of the test run, Grafana was not available. As a result, response times for Check-In/Check-Out were parsed manually from a .jtl file, using the start and finish dates of the data import tests. These results were visualized in JMeter using a Listener (Response Times Over Time).
Test set
...
|
Service CPU Utilization
Here we can see that mod-data-export used 593% CPU in spike.
Service Memory Utilization
We observed that the CPU usage of the mod-data-export module was initially at 44% before the test began. It spiked to 109% during the test and remained elevated without returning to the initial state.
DB CPU Utilization
DB CPU was 50%.
DB Connections
DB connections was 1368.
DB load
Top SQL-queries
Appendix
Infrastructure
PTF - environment Quesnelia (qcon)
11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
1 instance of db.r6.xlarge database instance: Writer instance
OpenSearch
domain: fse
Number of nodes: 9
Version: OpenSearch_2_7_R20240502
MSK - tenat
4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Kafka consolidated topics enabled
Methodology/Approach
Data Export tests scenario using the profiles Default instances export job profile and srs - holdings and items were started from UI on Quesnelia (qcon) ecs environment.
Test set
- Test 1: Manually tested 1k, 100k and 500k records files Data Export started on College tenant(cs00000int_0001) only using Default instances export job profile.
- Test 2: Manually tested 1k, 100k and 500k records files Data Export started on College tenant(cs00000int_0001) only using srs - holdings and items job profile.
- Test 3: Manually tested 1k, 100k and 500k records files Data Export started on Central tenant(cs00000int) only using Default instances export job profile.
- Test 4: Manually tested 1k, 100k and 500k records files Data Export started on Central tenant(cs00000int) only using srs - holdings and items job profile.
- Test 5: Manually tested 500k records file Data Export started on College tenant(cs00000int_0001), Central tenant(cs00000int) and Professional tenant(cs00000int_0002) concurrently using Default instances export job profile.
To get status and time range for export jobs the query used:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
SELECT
jsonb->>'status' AS status,
to_timestamp((jsonb->>'startedDate')::bigint / 1000) AS startedDate,
to_timestamp((jsonb->>'completedDate')::bigint / 1000) AS completedDate,
exported_file->>'fileName' AS fileName,
jsonb->>'jobProfileName' AS jobProfileName,
(jsonb->>'completedDate')::bigint - (jsonb->>'startedDate')::bigint AS duration_ms,
to_char(
(to_timestamp((jsonb->>'completedDate')::bigint / 1000) - to_timestamp((jsonb->>'startedDate')::bigint / 1000))::interval,
'HH24:MI:SS'
) AS duration_hhmmss
FROM
cs00000int_0001_mod_data_export.job_executions,
jsonb_array_elements(jsonb->'exportedFiles') AS exported_file
WHERE
-- (jsonb->>'hrId')::int IN (309, 310, 311, 312, 313, 314) -- Central tenant
(jsonb->>'hrId')::int IN (266, 267, 268, 269, 270, 271)
ORDER BY
jsonb->>'startedDate' DESC
LIMIT 10;
|