PTF-Test Heavy Workflows on multiple tenants concurrently Part -2.
This performance testing initiative is designed to assess the system's stability and performance degradation when handling multiple concurrent, resource-intensive workflows across an increasing number of tenants. The primary goal is to identify the saturation point where system resources become critical, leading to unacceptable performance or instability. All test were performed on Sunflower Eureka Galileo environment- SEGCON.
This report is a continuation of the performance testing efforts tracked under PERF-1214.
A new report was created to establish a fresh performance baseline following three months of active testing (documented in the MOBIUS Sunflower Performance Testing Report). During this period, the system underwent significant changes:
Database Growth: Continuous testing iterations resulted in a naturally increased database volume.
Environment Upgrades: The test environment was updated twice, introducing software version differences that require a separate, updated analysis.
Also in this report was icluded performance testing results from: https://folio-org.atlassian.net/browse/PERF-1227
Summary
Test 1. Heavy load test
Database resource usage (Database Load / AAS): The most significant finding is that Average Active Sessions (AAS) consistently hovered between 20 and 25, exceeding the Max vCPUs threshold (which is 8 for r7g.2xlarge). The primary wait events were
CPUandIO:DataFileRead, indicating the database is heavily constrained by both available compute power and disk read performance under this specific load.High RDS CPU & Connections: RDS CPU Utilization immediately spiked and remained consistently high at ~85% for the entire 30-minute duration. Concurrently, Database Connections jumped from a baseline of ~1.8K to a sustained peak of ~2,780
RDS Freeable Memory remained healthy and relatively flat at approximately 19.4 GB.
Service-Level CPU Hotspots: While overall service CPU remained low (peaking at a safe ~55%), specific services experienced massive CPU spikes.
mod-searchwas the most resource-intensive (peaking over 400%), followed closely bymod-source-record-storageandmod-audit(both reaching ~280% CPU utilization).Stable Memory Utilization: Despite the high CPU and database load, memory resources remained stable. Service memory usage showed no obvious leaks during the 30-minute window.
Test 2. Multi-Tenants Concurrent Tests
Goal Attainment & SLA Impact:
Target: Database CPU < 20%: Met. Database performance was highly stable. RDS CPU Utilization peaked briefly at 13.4% before settling between 4% and 7% for the remainder of the test.
Target: API p95 response time < 1s: Not Met. While Search, Inventory operations easily met this goal, CICO and authentication workflows significantly exceeded the 1-second threshold.
Target: No errors or timeouts
Database Health: The database was not the bottleneck in this scenario. Average Active Sessions (AAS) remained remarkably low (under 2), well below the maximum vCPU threshold of 16. Database connections scaled up normally, stabilizing at approximately 2,463 connections.
Service-Level CPU Utilization: Despite the overall health of the system, mod-search-b exhibited heavily inflated CPU utilization, peaking over 766% early in the test and sustaining over 350% during active load, another services had very low CPU utilization.
Test Runs
HW_Test_1
To simulate a heavy load, the test workflows were distributed across random tenants.
Test Duration: 30 minutes
8 Tenants: Check-In/Check-Out (CICO);
3 Tenants: Executed Data Import (50K, Profile: PTF-Create-3);
3 Tenants: Bulk Edit for holdings, users, and items with upload and edit operations;
2 Tenants: Executed Data Export workflows with Custom and Default profiles;
2 Tenants: Executed Harvesting workflows(OAI - PMH);
2 Tenants: Executed Refresh Lists workflows.
MT_Concurent test_1
Multi-Tenants Concurrent Tests. To simulate a heavy load, the test workflows were distributed across random tenants.
Test Duration: 30 minutes
User authentication (100 concurrent users);
Circulation operations (50 checkouts/min);
Search operations (200 searches/min);
Inventory browsing (100 requests/min)(API- mod-inventory[item, instance, holdings])
Results
SCENARIO | HW_Test_1 | |
Check | In | 5,4 sec |
Out | 7 sec | |
Data Import | 35 min | |
Data export | Custom profile | 23 min |
Default profile | 5 min | |
OAI - PMH | 0,5 sec | |
Bulk Edit Upload | holdings | 66 sec |
users | 31 sec | |
items | 76 sec | |
Bulk Edit edit | holdings | 250 sec |
users | 57 sec | |
items | 112 sec | |
ListApp | 4 sec - 120 sec | |
Transaction | Number of samplers during 30 min | Transactions per min | Average responce time | |
|---|---|---|---|---|
LOGIN | 584 | ~20 | 2,4 sec | |
Check In | 1478(50/50 distribution) | ~50 | 1,8 sec | |
Check Out | 2,9 sec | |||
Search operations | 5952 | ~200 | 0,42 sec | |
Inventory browsing | 6000 | ~200 | 0,03 sec | |
1. | IS_GET_/inventory/instances/{id} | 2039.00 | ~67 | 47.51 ms |
2. | IS_GET_/inventory/items-by-holdings-id | 1984.00 | ~66 | 11.15 ms |
3. | S_GET_/inventory/items/{itemID} | 1977.00 | ~65 | 32.77 ms |
Resource utilization graphs
Response time graphs
HW_Test_1
MT_Concurent test_1
This graph contains 3 main RT metrics:
Green (Average): The arithmetic mean of all response times combined.
Yellow (90 Percentile): 90% of requests were faster than this value.
Blue (95 Percentile): 95% of requests were faster than this value (shows the worst 5% of outliers).
EC2 instance CPU utilization graph
HW_Test_1
MT_Concurent test_1
Service CPU utilization graph
This graph breaks down CPU consumption by individual microservices during the HW_Test_1 test
MT_Concurent test_1
Service memory utilization graph
Memory consumption for individual microservices over time.
HW_Test_1
MT_Concurent test_1
RDS CPU Utilization
RDS CPU Utilization graph:
HW_Test_1. The database CPU usage spiked to ~86% and stayed consistently high (mostly above 85%)
MT_Concurent test_1
RDS Database Connections
HW_Test_1
MT_Concurent test_1
Freeable memory
HW_Test_1
MT_Concurent test_1
Database load
HW_Test_1
MT_Concurent test_1
Additional section(SQL queries)
We also collected the longest SQL queries
Also 1000 queries are in csv file
Appendix
Infrastructure
PTF -environment segcon -204 services
10x r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1