PTF-Test Heavy Workflows on multiple tenants concurrently Part -2.

PTF-Test Heavy Workflows on multiple tenants concurrently Part -2.

 

This performance testing initiative is designed to assess the system's stability and performance degradation when handling multiple concurrent, resource-intensive workflows across an increasing number of tenants. The primary goal is to identify the saturation point where system resources become critical, leading to unacceptable performance or instability. All test were performed on Sunflower Eureka Galileo environment- SEGCON.

This report is a continuation of the performance testing efforts tracked under PERF-1214.

A new report was created to establish a fresh performance baseline following three months of active testing (documented in the MOBIUS Sunflower Performance Testing Report). During this period, the system underwent significant changes:

  • Database Growth: Continuous testing iterations resulted in a naturally increased database volume.

  • Environment Upgrades: The test environment was updated twice, introducing software version differences that require a separate, updated analysis.

Also in this report was icluded performance testing results from: https://folio-org.atlassian.net/browse/PERF-1227

Summary

Test 1. Heavy load test

  1. Database resource usage (Database Load / AAS): The most significant finding is that Average Active Sessions (AAS) consistently hovered between 20 and 25, exceeding the Max vCPUs threshold (which is 8 for r7g.2xlarge). The primary wait events were CPU and IO:DataFileRead, indicating the database is heavily constrained by both available compute power and disk read performance under this specific load.

  2. High RDS CPU & Connections: RDS CPU Utilization immediately spiked and remained consistently high at ~85% for the entire 30-minute duration. Concurrently, Database Connections jumped from a baseline of ~1.8K to a sustained peak of ~2,780

  3. RDS Freeable Memory remained healthy and relatively flat at approximately 19.4 GB.

  4. Service-Level CPU Hotspots: While overall service CPU remained low (peaking at a safe ~55%), specific services experienced massive CPU spikes. mod-search was the most resource-intensive (peaking over 400%), followed closely by mod-source-record-storage and mod-audit (both reaching ~280% CPU utilization).

  5. Stable Memory Utilization: Despite the high CPU and database load, memory resources remained stable. Service memory usage showed no obvious leaks during the 30-minute window.

Test 2. Multi-Tenants Concurrent Tests

Goal Attainment & SLA Impact:

  • Target: Database CPU < 20%: Met. Database performance was highly stable. RDS CPU Utilization peaked briefly at 13.4% before settling between 4% and 7% for the remainder of the test.

  • Target: API p95 response time < 1s: Not Met. While Search, Inventory operations easily met this goal, CICO and authentication workflows significantly exceeded the 1-second threshold.

  • Target: No errors or timeouts

Database Health: The database was not the bottleneck in this scenario. Average Active Sessions (AAS) remained remarkably low (under 2), well below the maximum vCPU threshold of 16. Database connections scaled up normally, stabilizing at approximately 2,463 connections.

Service-Level CPU Utilization: Despite the overall health of the system, mod-search-b exhibited heavily inflated CPU utilization, peaking over 766% early in the test and sustaining over 350% during active load, another services had very low CPU utilization.

 

Test Runs 

HW_Test_1

To simulate a heavy load, the test workflows were distributed across random tenants.

Test Duration: 30 minutes

  • 8 Tenants: Check-In/Check-Out (CICO);

  • 3 Tenants: Executed Data Import (50K, Profile: PTF-Create-3);

  • 3 Tenants: Bulk Edit for holdings, users, and items with upload and edit operations;

  • 2 Tenants: Executed Data Export workflows with Custom and Default profiles;

  • 2 Tenants: Executed Harvesting workflows(OAI - PMH);

  • 2 Tenants: Executed Refresh Lists workflows.

MT_Concurent test_1

Multi-Tenants Concurrent Tests. To simulate a heavy load, the test workflows were distributed across random tenants.

Test Duration: 30 minutes

  • User authentication (100 concurrent users);

  • Circulation operations (50 checkouts/min);

  • Search operations (200 searches/min);

  • Inventory browsing (100 requests/min)(API- mod-inventory[item, instance, holdings])

 

Results

SCENARIO

HW_Test_1

Check

In

5,4 sec

Out

7 sec

Data Import

35 min

Data export

Custom profile

23 min

Default profile

5 min

OAI - PMH

0,5 sec

Bulk Edit Upload

holdings

66 sec

users

31 sec

items

76 sec

Bulk Edit edit

holdings

250 sec

users

57 sec

items

112 sec

ListApp

4 sec - 120 sec

 

Transaction

Number of samplers during 30 min

Transactions per min

Average responce time

Transaction

Number of samplers during 30 min

Transactions per min

Average responce time

LOGIN

584

~20

2,4 sec

Check In

1478(50/50 distribution)

~50

1,8 sec

Check Out

2,9 sec

Search operations

5952

~200

0,42 sec

Inventory browsing

6000

~200

0,03 sec

1.

IS_GET_/inventory/instances/{id}

2039.00

~67

47.51 ms

2.

IS_GET_/inventory/items-by-holdings-id

1984.00

~66

11.15 ms

3.

S_GET_/inventory/items/{itemID}

1977.00

~65

32.77 ms

Resource utilization graphs

Response time graphs


HW_Test_1

image-20260226-115558.png

MT_Concurent test_1

image-20260309-102833.png

 

 

This graph contains 3 main RT metrics:

  • Green (Average): The arithmetic mean of all response times combined.

  • Yellow (90 Percentile): 90% of requests were faster than this value.

  • Blue (95 Percentile): 95% of requests were faster than this value (shows the worst 5% of outliers).

EC2 instance CPU utilization graph


HW_Test_1

image-20260226-115927.png

MT_Concurent test_1

image-20260309-113722.png
image-20260309-113722.png
image-20260309-113722.png

 

Service CPU utilization graph

 

This graph breaks down CPU consumption by individual microservices during the HW_Test_1 test

image-20260226-120108.png

MT_Concurent test_1

image-20260309-103134.png

 



Service memory utilization graph

Memory consumption for individual microservices over time.

HW_Test_1

image-20260226-120357.png

MT_Concurent test_1

image-20260309-103321.png

 

 

RDS CPU Utilization

RDS CPU Utilization graph:

HW_Test_1. The database CPU usage spiked to ~86% and stayed consistently high (mostly above 85%)

image-20260226-120521.png

 

MT_Concurent test_1

image-20260309-103351.png

 

RDS Database Connections

HW_Test_1

image-20260226-120627.png

MT_Concurent test_1

image-20260309-103518.png

 

Freeable memory

HW_Test_1

image-20260226-120905.png

MT_Concurent test_1

image-20260309-103818.png

 

Database load

HW_Test_1

image-20260226-120959.png
image-20260226-121030.png
image-20260226-121042.png
image-20260226-121120.png

MT_Concurent test_1

image-20260309-103920.png
image-20260309-103944.png
image-20260309-104023.png

Additional section(SQL queries)

We also collected the longest SQL queries

image-20260227-120001.png

Also 1000 queries are in csv file

Appendix

Infrastructure

  1. PTF -environment segcon -204 services
    10x r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

     

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

R/W Split Enabled

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

R/W Split Enabled

mod-remote-storage

52

mod-remote-storage:3.4.3

2

4920

4472

64

3960

512

512

false

mod-remote-storage - Sidecar 1

N/A

/folio-module-sidecar:3.1.0-SNAPSHOT.551.nb

N/A

1024

512

64

256

0

96

false

mod-finance-storage

55

/mod-finance-storage:8.8.4

2

1024

896

128

700

88

128

false

mod-finance-storage - Sidecar 1

N/A

/folio-module-sidecar:3.1.0-SNAPSHOT.551.nb

N/A

1024

512

128

256

0

96

false

mod-ebsconet

46

/mod-ebsconet:2.4.0

2

1248

1024

64

700

128

256

false

mod-ebsconet - Sidecar 1

N/A

/folio-module-sidecar:3.1.0-SNAPSHOT.551.nb

N/A

1024

512

64

256

0

96

false

mod-mosaic

13

/mod-mosaic:1.0.0

2

1024

896

64

768

88

128

false

mod-mosaic - Sidecar 1

N/A

/folio-module-sidecar:3.1.0-SNAPSHOT.551.nb

N/A

1024

512

64

256

0

96

false

mod-consortia-keycloak

52

/mod-consortia-keycloak:1.7.3

2

5136

4776

512

4416

384

512

false

mod-consortia-keycloak - Sidecar 1

N/A

/folio-module-sidecar:3.1.0-SNAPSHOT.551.nb

N/A

1024

512

64

256

0

96

false

mod-tags

49

/mod-tags:2.4.1

2

1024

896

64

768

88

128

false

mod-tags - Sidecar 1

N/A

/folio-module-sidecar:3.1.0-SNAPSHOT.551.nb

N/A

1024

512