PTF - Data Export Test Report (Sunflower) [ECS]

PTF - Data Export Test Report (Sunflower) [ECS]

Overview

  • This document contains the results of testing Data Export (MARC BIB) on the Sunflower [ECS] release.

https://folio-org.atlassian.net/browse/PERF-1119

Summary

  • Data Export tests finished successfully on Sunflower(secon) environment using the profiles Default instances export job profile and srs - holdings and items job profile.

  • Data Export test executed on College tenant only.

  • Ramsons release results 

    • Data Export:

      • Default instances export job profile

        • File with 1k records     - 7 seconds

        • File with 100k records - 2 minute 55 seconds

        • File with 500k records - 4 minute 28 seconds

      • srs - holdings and items

        • File with 1k records     - 8 seconds

        • File with 100k records - 6 minute 52 seconds

        • File with 500k records - 7 minute 48 seconds

  • Comparing Ramsons (previous results) and Sunflower releases results. DE perform better in Ramsons for smaller datasets but it is better for 500k data in Sunflower env. Performance depends on file size and job profile.

    • Default instances export job profile

      • File with 100k records +52.7%

      • File with 500k records 0%

    • srs - holdings and items

      • File with 100k records +31.6%

      • File with 500k records - 6.21%

  • Mod-data-export used most of CPU with Default instances export job profile - 166% and srs - holdings and items - 120% with the 500k records file

  • Concurrent Data Export testing with srs - holdings and items job profile revealed the slowness on College DCB tenant. One notable difference is that CPU utilization of services is significantly higher in the Sunflower environment compared to Ramsons. The service memory utilization metric shows that the most memory-intensive services differ between the Sunflower and Ramsons environments.

Test Runs

Profile

Test #

CSV  File

Profile

Test #

CSV  File

DE MARC Bib (Default instances export job profile)  

1

1k.csv

2

100k.csv

3

500k.csv

DE MARC Bib (srs - holdings and items)  

4

1k.csv

5

100k.csv

6

500k.csv

Test Results

This table contains durations for Data Export with 2 job profiles. 

Profile

CSV  File

Tenant College (cs00000int_0001)

Result

Status

DE MARC Bib (Default instances export job profile)

1k.csv

0:00:07

COMPLETED

100k.csv

0:02:55

COMPLETED

500k.csv

0:04:28

COMPLETED

DE MARC Bib (srs - holdings and items)

1k.csv

0:00:08

COMPLETED

100k.csv

0:06:52

COMPLETED

500k.csv

0:07:48

COMPLETED

Comparison

This table contains durations comparison between Ramsons and Sunflower releases.

Profile

CSV  File

 Ramsons (cs00000int_0001) College tenant

Sunflower (cs00000int_0001) College tenant

DE Duration, DELTA Ramsons /Sunflower

Duration (hh:mm:ss)

percent / time

DE MARC Bib (Default instances export job profile)

1k.csv

00:00:02

00:00:07

250.00% / 5 sec

100k.csv

00:01:55

00:02:55

52.17% / 60 sec

500k.csv

00:04:26

00:04:28

0.75% / 2 sec

DE MARC Bib (srs - holdings and items)

1k.csv

00:00:07

00:00:08

14.28% / 1 sec

100k.csv

00:05:13

00:06:52

31.62% / 1 min 39 sec

500k.csv

00:08:19

00:07:48

-6.21% / 31 sec

Resource utilization

 

 

Service CPU Utilization

Maximum CPU utilization was in 500k file: Default instances export job profile - 166%, srs - holdings and items - 120%

image-20250507-120424.png

Service Memory Utilization

Maximum memory consumption was in mgr-applications - 75%, mod-dcb - 75%, mod-inventory - 67%, mod-scheduler - 62%

image-20250507-122808.png

DB CPU Utilization

Maximum RDS CPU 500k file , srs - holdings and items job - 31%, Default instances export job - 25%

image-20250507-130218.png

DB Connections

DB connections - 1171 in average. No spikes with different file size or job profile.

image-20250507-131233.png

DB load

image-20250507-172139.png

Top SQL-queries

image-20250507-172740.png

#

TOP SQL statements

#

TOP SQL statements

1

select iwhe1_0.id,iwhe1_0.hrid from v_instance_hrid iwhe1_0 where iwhe1_0.id in ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$1

2

select mre1_0.id,mre1_0.content,mre1_0.external_id,mre1_0.generation,mre1_0.leader_record_status,mre1_0.record_type,mre1_0.state,mre1_0.suppress_discovery from v_marc_records_lb mre1_0 where mre1_0.external_id in ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74

3

SELECT id, jsonb, holdings_record_id FROM cs00000int_0001_mod_data_export.v_item WHERE holdings_record_id in ($1)

4

select hre1_0.id,hre1_0.instance_id,hre1_0.jsonb from v_holdings_record hre1_0 where hre1_0.instance_id=$1

5

INSERT INTO job_executions_export_ids (job_execution_id, instance_id) VALUES ($1, $2) ON CONFLICT DO NOTHING

6

COMMIT

7

select eie1_0.id,eie1_0.instance_id,eie1_0.job_execution_id from job_executions_export_ids eie1_0 where eie1_0.job_execution_id=$1 and eie1_0.instance_id>=$2 and eie1_0.instance_id<=$3 order by eie1_0.instance_id offset $4 rows fetch first $5 rows only

Appendix

Infrastructure

 

 

  • secon 13 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 

  • 1 instance of db.r7g.xlarge database instance: Writer instance

  • MSK fse-test

    • 4 kafka.m7g.xlarge brokers in 2 zones (2 brokers per zone)

      • Apache Kafka version 3.7.x, metadata mode - KRaft

      • EBS storage volume per broker 300 GiB

      • auto.create.topics.enable=true

      • log.retention.minutes=480

      • default.replication.factor=3

      • revision - 26

  • OpenSearch 2.13 ptf-test cluster

    • r7g.2xlarge.search 4 data nodes

    • r6g.large.search 3 dedicated master nodes