PTF - Data Export Test Report (Sunflower) [NON-ECS]

PTF - Data Export Test Report (Sunflower) [NON-ECS]

Overview

  • This document contains the results of three times testing Data Export (MARC BIB) on the Sunflower[NON-ECS] release. After one month of the first test, two additional tests were conducted. In the second test, mod circulation storage was enabled. In the third test, mod circulation storage on module was disabled. All test results have been summarized correctly.

PERF-1115: [Sunflower] [non-ECS] [Data Export] MARC BIBClosed

Summary

  • Data Export tests finished successfully(except some duplications) on using the profiles Default instances export job profile and SRS - holdings and items job profile.

  • Comparing with previous results of Ramsons and Sunflower releases

    • Data Export processed all files including file with 500k records with negligible duplication errors in Sunflower env.

    • Data export durations are nearly twice as slow compared to Ramsons releases when mod circulation storage is enabled.

    • When mod circulation storage is disabled, it faster than run #1 and #2 but slower compared to Ramsons/

Test Results

This table contains durations for Data Export with two job profiles. 

 

Profile

CSV  File

Tenant (fs09000000)

Result

Status

DE MARC Bib (Default instances export job profile)

1k.csv

0:00:09

COMPLETED

100k.csv

0:06:03

COMPLETED

500k.csv

0:06:46

COMPLETED

DE MARC Bib (SRS - holdings and items job profile)

1k.csv

0:00:17

COMPLETED

100k.csv

0:12:20

COMPLETED

500k.csv

0:15:13

COMPLETED

 

Profile

CSV  File

Tenant (fs09000000) (mod-circulation storage enabled)

Tenant (fs09000000) (mod-circulation storage disabled)

Result

Status

Result

Status

DE MARC Bib (Default instances export job profile)

1k.csv

0:00:02

COMPLETED

0:00:02

COMPLETED

100k.csv

0:01:44

COMPLETED

0:01:39

COMPLETED

500k.csv

0:06:46

COMPLETED

0:05:22

COMPLETED

DE MARC Bib (SRS - holdings and items job profile)

1k.csv

0:00:04

COMPLETED

0:00:04

COMPLETED

100k.csv

0:12:30

COMPLETED

0:11:22

COMPLETED

500k.csv

0:16:40

COMPLETED

0:12:49

COMPLETED

Comparison

This table contains durations comparison between Sunflower and Ramsons releases.

Profile

CSV  File

DE Duration
Sunflower

DE Duration
Ramsons

DE Duration, DELTA Sunflower/Ramsons

Result
hh:mm:ss

Result
hh:mm:ss

hh:mm:ss / percent

DE MARC Bib (Default instances export job profile)

1k.csv

0:00:09

0:00:03

+00:00:06

100k.csv

0:06:03

0:02:19

+00:03:44
+161%

500k.csv

0:07:49

0:04:33

+00:03:16
+71%

DE MARC Bib (SRS - holdings and items job profile)

1k.csv

0:00:17

0:00:06

+00:00:11

100k.csv

0:12:20

0:07:02

+00:05:18
+75%

500k.csv

0:17:45

0:08:57

+00:08:48
+98%

 

Profile

CSV  File

DE Duration
Sunflower with Cir-Storege

DE Duration
Sunflower without Cir-Store

DE Duration
Ramsons

DE Duration with

Cir-Storage , DELTA Sunflower/Ramsons

DE Duration

without Cir-Storage , DELTA Sunflower/Ramsons

Result
hh:mm:ss

 

Result
hh:mm:ss

hh:mm:ss / percent

hh:mm:ss / percent

DE MARC Bib (Default instances export job profile)

1k.csv

0:00:02

0:00:02

0:00:03

-00:00:01

-00:00:01

100k.csv

0:01:44

0:01:39

0:02:19

-00:00:35
-25%

-00:00:40
-28.7%

500k.csv

0:06:46

0:05:22

0:04:33

+00:02:13
+50.9%

+00:00:49
+17.9%

DE MARC Bib (SRS - holdings and items job profile)

1k.csv

0:00:04

0:00:04

0:00:06

-00:00:02

-00:00:02

100k.csv

0:12:30

0:11:22

0:07:02

+00:05:28
+77.7%

+00:04:20
+61.6%

500k.csv

0:16:40

0:12:49

0:08:57

+00:07:43
+86.8%

+00:03:52
+43%

Resource utilization for Tests

 Resource utilization table

CPU 

RAM 

CPU 

RAM 

mod-data-export-b

88.2%

mod-inventory-b

78.3%

mod-remote-storage-b

13.7%

mod-data-export-b

63%

mod-inn-reach-b

13.7%

mod-users-keycloak-b

56.4%

mod-audit-b

11.3%

mod-audit-b

55.1%

mod-pubsub-b

8.2%

mod-pubsub-b

45.1%

mod-users-keycloak-b

7.8%

mod-quick-marc-b

37.6%

mod-inventory-b

7.7%

mod-inn-reach-b

31.8%

mod-quick-marc-b

7.4%

mod-invoice-storage-b

31.8%

mod-invoice-storage-b

7.4%

mod-remote-storage-b

 

22.2%

Instance CPU Utilization

image-20250502-055819.png
image-20250505-090641.png

Overview

This report compares the CPU utilization patterns observed during data export operations in the Ramsons and Sunflower environments. The purpose is to highlight performance differences and identify potential reasons why the Sunflower environment is approximately 2x slower in completing data exports, particularly for larger datasets.


Key Observations

Ramsons Environment

  • 14:35: A 100k record export (Default profile) caused a minor CPU uptick (~5%).

  • 14:45: A 500k export (Default profile) triggered a sharp spike in CPU usage—peaking around 26% on one instance.

  • 15:10: Another 500k export using the SRS holdings and items profile resulted in a second peak—up to ~35.6%.

  • Performance: The spikes were short-lived, and CPU usage returned to baseline quickly, indicating efficient processing and resource handling.

Sunflower Environment

  • 07:40–08:15: Two 100k exports (one Default, one SRS) led to mild CPU fluctuations.

  • 08:20: A 500k export (Default profile) caused a moderate CPU spike (~21%).

  • 09:15–09:25: A 500k export (SRS profile) resulted in sustained CPU utilization across multiple instances, peaking around 21.5%, but lasting nearly 10 minutes.

  • Performance: CPU load was more prolonged and distributed across instances. The system took significantly longer to complete the exports.


Conclusion: Sunflower Is Slower (~2x) Compared to Ramsons

Although Ramsons shows higher CPU peaks during export jobs, it consistently completes the tasks in shorter time frames. Sunflower, on the other hand, exhibits lower peak CPU but longer sustained utilization, indicating slower job execution. This aligns with the observation that Sunflower is approximately 2x slower in completing large export tasks.

image-20250611-094147.png
image-20250611-094453.png

 

Service CPU Utilization

Here we can see that mod-data-export used 88% CPU in spike.

image-20250501-141747.png
image-20250505-110625.png

Observation Timeframe: 07:30–09:30 UTC
Key Metrics:

  • 100k export (Default and SRS):

    • Both operations push CPU usage above 10%, higher than Ramsons under similar load.

    • Indicates that even smaller exports in Sunflower require more CPU time.

  • 500k export (Default):

    • mod-data-export-b spikes dramatically to ~88%, more than double Ramsons' peak for the same job.

    • The spike is also sustained for ~5–6 minutes, rather than short and sharp.

  • 500k export (SRS):

    • CPU usage hovers between 50–65% across services for over 10 minutes.

    • Clearly shows longer processing time and more sustained system pressure.

Interpretation:
Sunflower shows consistently higher and longer-lasting CPU usage, implying slower processing and less efficient handling of the export workload.


🆚 Comparison Table

Metric

Ramsons

Sunflower

Metric

Ramsons

Sunflower

100k (Default & SRS)

Under 10% CPU

Above 10% CPU

500k (Default)

~42% peak, fast drop

~88% peak, longer duration

500k (SRS)

~30–35% for ~5–6 minutes

50–65% for 10+ minutes

Job Speed

Faster execution

Slower execution (~2x slower)

CPU Pattern

Efficient bursts

Prolonged load

mod-data-export-b Load

Short, high spike

High and sustained


Analysis & Highlights

  • Sunflower is nearly twice as slow as Ramsons during 500k exports, despite showing higher CPU usage — especially evident in the SRS export.

  • 100k export CPU usage is higher in Sunflower, suggesting less optimization even for smaller jobs.

  • mod-data-export-b service in Sunflower hits 88%, which is double Ramsons' peak (42%) during the same 500k Default export. This suggests:

    • Potential CPU starvation or throttling elsewhere.

    • Less efficient threading or job execution settings.

    • Possibly higher contention or background load in Sunflower.

    • In Ramsons mod-inventory was the second but mod-inventory-storage was second in Sunflower env.

image-20250610-104823.png
image-20250610-110258.png

 

Service Memory Utilization

Here we can see all services have stable trends but mod-data-export which used more memory when data volume increased.

image-20250502-122302.png
image-20250610-112417.png
image-20250610-112806.png

 

DB CPU Utilization

DB CPU spike was 77% when 1k data was exported and first 100k data started its job. However, for data 500k, DB cpu utilization fluctuated between 50% and 60%.

image-20250610-114108.png
image-20250505-080747.png

This comparison evaluates the performance of the data export feature in terms of database utilization across two environments—Ramsons and Sunflower—using two export profiles: d and srs.

Ramsons Environment:

  • For an export of 500K records:

    • The d profile resulted in approximately 32% database utilization.

    • The srs profile showed a higher utilization of around 44%.

  • Observation: The srs profile in Ramsons consumes significantly more database resources than the d profile for the same data volume, indicating it may be less efficient or more resource-intensive.

Sunflower Environment:

  • For 500K records:

    • The d profile reached 48% utilization.

    • The srs profile peaked at 58%.

  • For 100K records:

    • Interestingly, the d profile showed a very high utilization of 74%, which is unusually high for a smaller dataset.

    • The srs profile reported a more moderate 52% utilization.

Additional Observation:

  • At around 8:50 AM in the Sunflower environment, a database utilization of approximately 50% was observed even though no export activity was occurring. This unexpected load may indicate background processes, system overhead, or anomalies in resource management.


Summary:

  • The srs profile consistently consumes more resources than d at the 500K level in both environments.

  • The d profile in the Sunflower environment shows an unusual spike in usage at 100K, potentially pointing to inefficiencies or configuration issues.

  • Idle-time database load in Sunflower also warrants investigation to identify potential bottlenecks or background operations.

image-20250610-120316.png
image-20250610-121633.png

 

DB Connections

DB connections was 1201.

image-20250501-193522.png
image-20250610-122359.png
image-20250610-122505.png

 

Kafka metrics

image-20250501-194344.png
image-20250610-123710.png
image-20250610-124000.png

 

image-20250501-194903.png
image-20250610-125106.png
image-20250610-125243.png

OpenSearch Data Nodes metrics

image-20250501-201005.png
image-20250610-133345.png
image-20250610-133858.png

 

DB load

image-20250501-212518.png

 

image-20250505-084617.png

Comparison Summary

Category

Ramsons (White)

Sunflower (Black)

Category

Ramsons (White)

Sunflower (Black)

Database

rcp1-db-restored-cluster-1 (Aurora 16.1)

secp1-db (Aurora 16.8)

Instance Type

db.r6g.xlarge

db.r7g.xlarge

AAS Peaks (Load)

Mostly under 1.5, peak ~3.2

Often near or above 3, peak ~4.5

vCPU

Lower AAS-to-vCPU ratio

AAS near or beyond max vCPU line at peak

Profiles Used

Default (first half), SRS (second half)

Default & SRS (interleaved)

Export Volume Labels

1k, 100k, 500k (marked)

1k, 100k (srs), 100k (d), 500k (d), 500k (srs)

Load Distribution

Sharp spikes during 500k export, recovers

Sustained load under 500k runs

Query Mix

More INSERT INTO job_executions, SELECT

More UPDATE, SELECT, autovacuum, ANALYZE


Notable Observations – Why Sunflower May Be Performing Poorly

  1. CPU Saturation

    • In Sunflower, AAS often hits or exceeds max vCPU (dotted line), especially during 500k jobs.

    • In contrast, Ramsons stays well below vCPU saturation, even at similar volumes.

  2. More Update-Heavy Workload

    • Sunflower shows heavier use of UPDATE statements (especially mod_users, mod_inventory_storage, etc.).

    • These are more CPU- and I/O-intensive than SELECT or INSERT alone — contributing to high load.

  3. Frequent Autovacuum/Analyze

    • More autovacuum and ANALYZE entries in Sunflower — indicates table churn or poor vacuum tuning.

    • Autovacuum can clash with workload, causing contention.

  4. Longer Sustained Load

    • Sunflower load is sustained over longer periods vs. the short bursts in Ramsons.

    • Suggests poorer query optimization or longer query runtimes.

  5. Possibly Misaligned JVM/Container Settings

    • Based on prior screenshots, container memory/CPU allocations (soft/hard limits) might not be sufficient or aligned in Sunflower.

    • This could cause GC thrashing or CPU starvation under load.


Conclusion

Ramsons handles both default and SRS profiles more gracefully even at higher volumes.

Sunflower struggles with:

  • CPU saturation,

  • Update-heavy query mix,

  • High autovacuum activity,

  • Longer query runtimes.

 

image-20250610-155541.png
image-20250610-155936.png

 

Top SQL-queries

image-20250502-163504.png
image-20250611-085848.png
image-20250611-090428.png

 

Top applications

image-20250501-212820.png
image-20250611-090155.png
image-20250611-090653.png

 

Appendix

Infrastructure

PTF - environment Sunflower NON-ECS (secp1)

  • secp1 12 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • 1 instance of db.r7.xlarge database instance: Writer instance

  • MSK fse-test

    • 4 kafka.m7g.xlarge brokers in 2 zones (2 brokers per zone)

      • Apache Kafka version 3.7.x, metadata mode - KRaft

      • EBS storage volume per broker 300 GiB

      • auto.create.topics.enable=true

      • log.retention.minutes=480

      • default.replication.factor=3

      • revision - 26

  • OpenSearch 2.13 ptf-test cluster

    • r7g.2xlarge.search 4 data nodes

    • r6g.large.search 3 dedicated master nodes

 

Cluster Resources - secp1-pvt (Tue May 06 09:21:45 UTC 2025)

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

R/W Split Enabled

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

R/W Split Enabled

mod-remote-storage

1

mod-remote-storage:3.4.1

2

4920

4472

128

3960

512

512

false

Azimjon Alijonov
June 25, 2025

When we discover together with Martin we saw that even if no test is being conducted , there were some queries were active related to mod circulation storage . And so , we conducted test by disabling mod circulation storage in our test N3. you can see there is latency much lower