Skip to end of banner
Go to start of banner

PTF - Data Export Test Report (Quesnelia) [ECS]

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Overview

  • This document contains the results of testing Data Export (MARC BIB) on the Quesnelia [ECS] release on qcon environment.

PERF-844 - Getting issue details... STATUS  

Summary

  • Data import tests finished successfully, only Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files. Duration of DI grew in correspondence with the number of records in files.
  • Check-in and Check-out with 5 virtual users was performed during DI Create new MARC authority records jobs for non-matches No issues.
  • Data Import in Quesnelia without CICO perform faster than with it.
  • Comparing Poppy and Quesnelia releases
    • Check-in / Check-out perform better in Quesnelia.  Response time improved during Create jobs for long period of work time on 15% in Average.
    • DI durations improved  - 11%-14% in Average.
  • During testing, we noticed spikes in the mod permissions module. To mitigate this issue and prevent system slowdowns, we adjusted the order of loading files, starting with Tenant 3 (qcp1-02), followed by Tenant 2 (qcp1-01), and finally Tenant 1 (qcp1-00).

Test Results



Comparison

Test №1

Test with 1k, 10k, 25k and 50k records files DI started on one tenant only(qcp1-00), and comparative results between Poppy and Quesnelia.

# of records 

% creates

File

DI duration 
Morning Glory

DI duration
Nolana

DI duration 
Orchid

DI duration 
Poppy

DI duration 
Quesnelia
1,0001001k_marc_authority.mrc?api=v2 24 s27 s41 sec29 sec22 sec
-24%
5,000100 LC_SUBJ_msplit00000000.mrc?api=v21 min 21 s1 min 15 s1min 21s1 min 38 sec1 min 19 sec
-19%
10,000100msplit00000000.mrc?api=v2 2 min 32 s2 min 31 s2min 53s2 min 53 sec2 min 36 sec
-9.8%
22778
(for Poppy test)
25000
(for Quesnelia test)
100 msplit00000013.mrc?api=v211 min 14 s7 min 7 s5 min 42s6 min 24 sec6 min 19 sec
-1.3%
50,00010050000_authorityrecords.mrc?api=v222 min11 min 24 s11 min 11s13 min 48 sec11 min 59 sec
-13%

Test №2

Test with CICO 5 concurrent users and DI 1K, 5K, 10K, 25K and 50K started on one tenant only.

  • Сomparative Baseline Check-In\Check-Out results without Data Import between Poppy and Quesnelia.

CICO, Median time without
DI
(Poppy)

CICO, 95% time without
DI
(Poppy)
CICO, Median time without
DI
(Quesnelia)
CICO, 95% time without
DI
(Quesnelia)
CICO, Avg time without
DI
(Quesnelia)
Check-In516 ms567 ms503 ms
-2.5%

593 ms
+4.5%

511 ms

Check-Out910 ms2094 ms 836 ms
-8%
1117 ms
-46%
876 ms
  • Сomparative  Check-In\Check-Out results between Baseline (Quesnelia) and  Check-In\Check-Out plus Data Import (Quesnelia.)
# of records
(Quesnelia)

DI Duration with CICO
(Quesnelia)

CI time Avg
(Quesnelia)
CI time 95th pct
(Quesnelia)
CO time Avg
(Quesnelia)
CO time 95th pct
(Quesnelia)
Baseline CI
Avg delta
Baseline CI 
95th pct delta
Baseline CO
Avg delta
Baseline CO 
95th pct delta
1,000

20 sec

0.5600.7541.1641.313+9%+27%+32%+17%
5,0001 min 19 sec0.7011.1711.1411.790+37%+97%+30%+60%
10,0002 min 35 se0.7231.0241.1791.494+41%+72%+34%+34%
25,0006 min 26 sec0.7221.0241.1801.494+41%+72%+35%+34%
50,000

12 min 16 sec

0.7771.045

1.265

1.550+52%+76%+44%+39%
  • Сomparative Data Import and Check-In\Check-Out results between Poppy and Quesnelia.

# of records 
(Poppy)

DI Duration with CICO
(Poppy)

CI time Avg
(Poppy)

CI time 95th pct
(Poppy)

CO time Avg
(Poppy)

CO time 95th pct
(Poppy)

# of records
(Quesnelia)

DI Duration with CICO
(Quesnelia)

CI time Avg
(Quesnelia)
CI time 95th pct
(Quesnelia)
CO time Avg
(Quesnelia)
CO time 95th pct
(Quesnelia)
1,00035 sec0.5250.5761.0781.3261,000

20 sec
-42.8%

0.560
+6%
0.754
+30%
1.164
+8%
1.313
-1%
5,0001 min 41 sec0.5130.6120.91.0195,0001 min 19 sec
-21.7%
0.701
+36%
1.171
+91%
1.141
+26%
1.790
+75%
10,0003 min 4 sec0.5810.6851.0161.32110,0002 min 35 sec
-15.7%
0.723
+24%
1.024
+49%
1.179
+16%
1.494
+13%
22,7786 min 32 sec0.5981.5421.2441.72925,0006 min 26 sec
-1.5%
0.722
+20%
1.024
-33%
1.180
-5%
1.494
-13%
50,00013 min 48 sec0.6711.9531.512.0950,000

12 min 16 sec
-11%

0.777
+15%
1.045
-46%

1.265
-16%

1.550
-25%


Resource utilization for Test #1 and Test #2

 Resource utilization table
CPURAM
mod-data-export-b452%mod-data-export-b75%
mod-inventory-b13%mod-source-record-manager-b53%
mod-source-record-storage-b2.40%mod-inventory-b48%
mod-source-record-manager-b1.80%okapi-b32%
okapi-b1.10%mod-source-record-storage-b30%
mod-authtoken-b0.90%mod-authtoken-b20%
mod-users-bl-b0.50%mod-users-bl-b19%
nginx-okapi0.40%mod-inventory-storage-b16%
mod-inventory-storage-b0.40%nginx-okapi5%

Service CPU Utilization

Here we can see that mod-data-export used 452% CPU in spike.

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU spike was 32%.

DB Connections

DB connections was 1470.

DB load

                                                                                                                     

Top SQL-queries


#TOP 5 SQL statements
1
INSERT INTO job_executions_export_ids (job_execution_id, instance_id) VALUES ($1, $2) ON CONFLICT DO NOTHING
2
INSERT INTO job_executions_export_ids (job_execution_id, instance_id) VALUES ($1, $2) ON CONFLICT DO NOTHING
3
select mre1_0.id,mre1_0.content,mre1_0.external_id,mre1_0.leader_record_status,mre1_0.record_type,mre1_0.state,mre1_0.suppress_discovery from v_marc_records_lb mre1_0 where mre1_0.external_id in ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$
4
select iwhe1_0.id,iwhe1_0.hrid from v_instance_hrid iwhe1_0 where iwhe1_0.id in ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$1
5
select hre1_0.id,hre1_0.instance_id,hre1_0.jsonb from v_holdings_record hre1_0 where hre1_0.instance_id=$1


Resource utilization for Test #3 and Test #4

 Resource utilization table
CPURAM
mod-data-export-b336%mod-data-export-b73%
mod-inventory-b14%mod-source-record-manager-b53%
mod-source-record-storage-b2.20%mod-inventory-b46%
mod-source-record-manager-b1.70%okapi-b33%
okapi-b0.90%mod-source-record-storage-b30%
mod-authtoken-b0.80%mod-users-bl-b21%
mod-users-bl-b0.50%mod-authtoken-b21%
mod-inventory-storage-b0.30%mod-inventory-storage-b16%
nginx-okapi0.20%nginx-okapi5%

Service CPU Utilization

Here we can see that mod-data-export used 336% CPU in spike.

Service Memory Utilization

Here we can see that all modules show a stable trend.

DB CPU Utilization

DB CPU was 35%.

DB Connections

DB connections was 1377.

DB load

Top SQL-queries


#TOP 5 SQL statements
1
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
2
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
3
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *
4
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
5
WITH input_rows(record_id, authority_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_authorities(record_id, authority_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, authority_id::uuid
   )
SELECT record_id, authority_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.authority_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_authorities c USING (record_id);


Appendix

Infrastructure

PTF - environment Quesnelia (qcon)

  • 11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]

  • 1 instance of db.r6.xlarge database instance: Writer instance

  • OpenSearch

    • domain: fse

    • Number of nodes: 9

    • Version: OpenSearch_2_7_R20240502

  • MSK - tenat

    • 4 kafka.m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

    • Kafka consolidated topics enabled


 Quesnelia modules memory and CPU parameters


Additional links and Errors


Test №5 had one failed record for Tenant 2(qcp1-01) when processed 50k files.

  • 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR Api Access for user 'folio' (9eb67301-6f6e-468f-9b1a-6134dc39a684) requires permission: metadata-provider.incomingrecords.get
  • 09:55:16 [815600/metadata-provider] [fs07000001] [9eb67301-6f6e-468f-9b1a-6134dc39a684] [mod_source_record_manager] ERROR PostgresClient queryAndAnalyze: ERROR: invalid input syntax for type uuid: "undefined" (22P02) - SELECT * FROM get_record_processing_log('3e63f944-40ea-477c-ac21-79bb24780bc5', 'undefined')
  • 09:55:16 [526300/metadata-provider] [fs07000001] [] [mod-authtoken] ERROR FilterApi Permission missing in []

Also we used different order for Tenants when load files, we decided started load files from Tenant 3(qcp1-02) → Tenant 2(qcp1-01) → Tenant 1(qcp1-00) to avoid problem when mod-permissions spiked and  system stacked.

CPU Utilization when mod-permissions spiked and  system stacked.



CPU Utilization when mod-permissions spiked and  system stacked.


Methodology/Approach

Data Export tests scenario using the profiles Default instances export job profile and srs - holdings and items were started from UI on Quesnelia (qcon) ecs environment.

Test set

  • Test 1: Manually tested 1k, 100k and 500k records files Data Export started on one tenant(cs00000int_0001) only using Default instances export job profile.
  • Test 2: Manually tested 1k, 100k and 500k records files Data Export started on one tenant(cs00000int_0001) only using srs - holdings and items job profile.
  • Test 3: Manually tested 1k, 100k and 500k records files Data Export started on central tenant(cs00000int) only using Default instances export job profile.
  • Test 4: Manually tested 1k, 100k and 500k records files Data Export started on central tenant(cs00000int) only using srs - holdings and items job profile.

To get status and time range for export jobs the query used: 

SQL Query
SELECT 
    jsonb->>'status' AS status,
    to_timestamp((jsonb->>'startedDate')::bigint / 1000) AS startedDate,
    to_timestamp((jsonb->>'completedDate')::bigint / 1000) AS completedDate,
    exported_file->>'fileName' AS fileName,
	jsonb->>'jobProfileName' AS jobProfileName,
    (jsonb->>'completedDate')::bigint - (jsonb->>'startedDate')::bigint AS duration_ms,
    to_char(
        (to_timestamp((jsonb->>'completedDate')::bigint / 1000) - to_timestamp((jsonb->>'startedDate')::bigint / 1000))::interval, 
        'HH24:MI:SS'
    ) AS duration_hhmmss
FROM 
    cs00000int_0001_mod_data_export.job_executions,
    jsonb_array_elements(jsonb->'exportedFiles') AS exported_file
WHERE 
-- 	(jsonb->>'hrId')::int IN (309, 310, 311, 312, 313, 314) -- Central tenant
    (jsonb->>'hrId')::int IN (266, 267, 268, 269, 270, 271)
ORDER BY 
    jsonb->>'startedDate' DESC
LIMIT 10;


  • No labels