PTF-Test Hybrid DB Deployment on multiple tenants[Galileo environment]. Provisioned primary DB db.r7g.2xlarge paired with Dedicated Offload DB db.r8g.2xlarg. Part 3

PTF-Test Hybrid DB Deployment on multiple tenants[Galileo environment]. Provisioned primary DB db.r7g.2xlarge paired with Dedicated Offload DB db.r8g.2xlarg. Part 3

 

Overview

This report details the performance evaluation of a Dual-Provisioned Database architecture within a multi-tenant FOLIO environment (Galileo release). The tested architecture features a Provisioned Primary DB (db.r7g.2xlarge) paired with a Dedicated Offload DB (db.r8g.2xlarge). 7 highly intensive, I/O-heavy modules - mod-source-record-manager, mod-source-record-storage, mod-inventory-storage, mod-orders-storage, mod-data-export, mod-lists, and mod-fqm-manager were routed to the Dedicated Offload DB(db.r8g.2xlarge). The objective of this testing phase is to determine if a dual-provisioned hardware baseline can successfully stabilize heavy Data Import, Data Export) without degrading real-time Circulation (CICO) and ListApp workflows across 60 tenants.
Ticket: https://folio-org.atlassian.net/browse/PERF-1334

 

Summary:

Test 3 - Standard Load: Under the standard load profile, the Dual-Provisioned setup easily outperforms the previous Serverless baseline. Data Import (50K) matched its optimal 18-minute benchmark, Data Export completed in 17 minutes, and CICO operations reduced by about 1 second. This confirms that the r8g.2xlarge instance can handle the concurrent load of all 7 heavy modules under normal operating conditions.

Test 4 applied a 3x load multiplier to stress-test the architecture's breaking point. CICO achieved its fastest times yet (Check-In at 2.40 seconds; Check-Out at 3.50 seconds). Data Import execution degraded to 57 minutes. Based on the Performance Insight graph for segcon-perf1334-2nddb (Dedicated Offload DB, db.r8g.2xlarge), the primary bottleneck identified is IO: DataFileRead, which dominated the wait events throughout the test, accounting for approximately 70–80% of Average Active Sessions (AAS). CPU utilization remained negligible throughout, confirming that compute was not the limiting factor. The degradation observed in Test 4 results — particularly the Data Import regression (18 min → 57 min) is consistent with this I/O saturation pattern on the Offload DB.

Test 5 applied a vertically scaled Dedicated Offload DB (db.r8g.4xlarge) under a high load profile (60 tenants) to determine if doubling the hardware capacity could mitigate the bottlenecks observed in Test 4. Despite the increased CPU and memory, the vertical scaling yielded no meaningful performance recovery. CICO remained stable (Check-In at 3.12 seconds; Check-Out at 4.37 seconds), also Data Import duration 59 minutes, and ListApp didin`t fail and latency reached 8 minutes. This confirms that throwing more physical hardware at the Offload DB does not resolve the degradation.

 

Recommendations & Jiras

Test Runs 

The first two tests (Test 1 and Test 2), shown in the test results table, are from previous testing. Link to the report:

Test 1- Report
Test 2 - Report

Test 1. Baseline r7g 4xl;

Test 2. Baseline r7g 2xl + SL 5-64ACU;

Test 3.

  • Architecture: Provisioned primary DB db.r7g.2xlarge paired with Dedicated Offload DB db.r8g.2xlarge.

  • Load model: Test scenario 1: Medium load.

  • Routing: 7 modules (srm, srs, inv-strg, ord-strg, data-export, lists, and fqm-manager) routed to Dedicated Offload DB db.r8g.2xlarge.

  • Number of tenants: 60;

Test 4.

  • Architecture: Provisioned primary DB db.r7g.2xlarge paired with Dedicated Offload DB db.r8g.2xlarge.

  • Load model: Test scenario 2: High load.

  • Routing: 7 modules (srm, srs, inv-strg, ord-strg, data-export, lists, and fqm-manager) routed to Dedicated Offload DB db.r8g.2xlarge.

  • Number of tenants: 60;

Test 5.

  • Architecture: Provisioned primary DB db.r7g.2xlarge paired with Dedicated Offload DB db.r8g.4xlarge.

  • Load model: Test scenario 2: High load.

  • Routing: 7 modules (srm, srs, inv-strg, ord-strg, data-export, lists, and fqm-manager) routed to Dedicated Offload DB db.r8g.4xlarge.

  • Number of tenants: 60;

Test scenarios:

Test scenario 1: Medium load

  • 8 Tenants: Check-In/Check-Out (CICO);

  • 3 Tenants: Executed Data Import (50K, Profile: PTF-Create-3);

  • 3 Tenants: Bulk Edit for holdings, users, and items with upload and edit operations;

  • 2 Tenants: Executed Data Export workflows with Custom and Default profiles;

  • 2 Tenants: Executed Harvesting workflows(OAI - PMH);

  • 2 Tenants: Executed Refresh Lists workflows.

Test scenario 2: High load(x3)

  • 24 Tenants: Check-In/Check-Out (CICO);

  • 9 Tenants: Executed Data Import (50K, Profile: PTF-Create-3);

  • 9 Tenants: Bulk Edit for holdings, users, and items with upload and edit operations;

  • 6 Tenants: Executed Data Export workflows with Custom and Default profiles;

  • 6 Tenants: Executed Harvesting workflows(OAI - PMH);

  • 2 Tenants: Executed Refresh Lists workflows.

Results

SCENARIO

Baseline r7g 4xl

Baseline r7g 2xl + SL 5-64ACU

Test 3

Test 4

Test 5

1 VU

Check

In

5,4 sec

4,2 sec

3.02 s

2.40 sec

3.12 s

3.3 sec

Out

7 sec

5,89 sec

4.67 s

3.50 sec

4.37 s

4.1 sec

Data Import
50K - Create

35 min

26 min

18 min

57 min

59 min

19 min 35 sec

Data export

Custom profile

23 min

17 min

17 min

9 min

9 min

8.6 min

Default profile

5 min

4,2 min

5 min

4,6 min

3,8 min

4,37 min

OAI - PMH

0,5 sec

0,456 sec

0,34 sec

0,536 sec

0,55 sec

0.372 sec

Bulk Edit Upload

holdings

66 sec

95 sec

71 sec

96 sec

93 sec

66 sec

users

31 sec

40 sec

36 sec

53 sec

45 sec

32 sec

items

76 sec

120 sec

95 sec

152 sec

174 sec

231 sec

Bulk Edit edit

holdings

250 sec

192 sec

212

234 sec

227 sec

141 sec

users

57 sec

39 sec

42 sec

42 sec

44 sec

37 sec

items

112 sec

84 sec

97 sec

96 sec

104 sec

53 sec

ListApp

4 sec - 120 sec

3 sec - 9 min

3 sec - 4 min

failed

14 sec-8 min

The comparison table is below

List id

tenant

 

Query

Number of returned records

Without load

Under the load

9acbffb2-599b-4c04-8d64-d40350d6193e,

cs00000001_0002

Query: (items.updated_date > 12/26/2025)

754000

1,85 min

2,66 sec

ebf47a4e-924b-4f18-9f37-cd8a0a9e00c3,

cs00000001_0002

Query: (loans.checkout_date > 01/07/2024) AND (users.active == True)

0

4 sec

6,2 sec

02bbd876-f3bd-4602-8366-e9a052dd376b,

cs00000001_0003

mtypes.name != 2 Dimensional Graphic)

162061

14 sec

21,5 sec

63b6433e-d6f3-4807-962b-3dd153fd2fe4,

cs00000001_0003

Query: (loans.status_name == Open) AND (users.active == True)

0

3 sec

 3,81sec

1d115c40-dd74-46e9-a9a7-ca5e0e66fa7d,

cs00000001_0004

(items.effective_call_number starts with G)

8790

4 sec

7 sec

ed1fe639-d9d7-4622-b929-79cc086e9019,

cs00000001_0004

 (loclibrary.name != ROCK Greenlease Library)

1210345

26 sec

45 sec

6fcea6a4-298d-4116-b8c5-9d3581f8a460,

cs00000001_0005

Query: (users.created_date > 01/08/2021) AND (users.active == True)

31000

5 sec

6,18 sec

ab7fab61-b68d-47b4-97ba-03ea2e68f61b,

cs00000001_0005

Query: (items.status_name in [Aged to lost, Available, Awaiting pickup, Awaiting delivery, Checked out, Claimed returned, Declared lost, In process, In process (non-requestable), In transit, Intellectual item, Long missing, Lost and paid])

154000

10 sec

20,3 sec

16d8469f-92b9-4673-9c33-45931b92e00d,

cs00000001_0006

(mtypes.name == 2 Dimensional Graphic) AND (instances.title starts with A)

25300

2,4 min

4,97 sec

571df569-79c1-421c-afc0-032a558d23ee,

cs00000001_0006

Query: (holdings.call_number starts with A)

5890

3 sec

3 sec

09602d2e-6187-4917-8704-edf8ed81b895,

cs00000001_0007

Query: (holdings.call_number is null/empty true) AND (effective_location.name == DCB) AND (holdings.updated_at > 01/08/2025) AND (permanent_location.name == DCB) AND (holdings.hrid starts with mbtsho000003)

61500

11 sec

13,9 sec

d6f8c5be-1ed5-49a3-8246-87cfa11a3941,

cs00000001_0007

 (instance.shared == Local) AND (instance.title contains a) AND (instance.hrid != mbtsin00000)

1200000

5,45 min

6,3 min

82e54134-e2c3-4d51-bfdb-b3fd029c9799,

cs00000001_0007

Query: (loans.action is null/empty false) AND (instance.title starts with B) AND (users.active == True) AND (instance.instance_type_name == text)

63

7 sec

9 sec

ff422ca4-26b7-4f5b-b8ce-6f9c993843f0,

cs00000001_0008

Query: (loclibrary.name in [DCB, MWSU Library]) AND (holdings.updated_at < 01/08/2026) AND (holdings.hrid starts with mwsuho000002)

99863

13 sec

16,6 sec

bf3102c6-9fb7-4f6a-97c0-4ce2bfbfd9bd,

cs00000001_0008

(users.created_date is null/empty false) AND (groups.group in [zEBSCO Support Group, staff, undergrad, faculty, graduate]) AND (users.barcode contains 1)

994

3 sec

2,4 sec

e6fd6c2d-fd91-4e71-bedf-ec3d0ca4f7d3,

cs00000001_0008

 (items.status_name is null/empty false) AND (loans.due_date < 01/08/2026) AND (users.barcode contains 1) AND (users.barcode contains 2) AND (users.barcode contains 3) AND (instance.shared in [Shared, Local])

1002

3 sec

4sec

7a11b736-12f8-4077-8756-20cfe02b729f,

cs00000001_0009

Query: (instance.discovery_suppress == True) AND (date_type.name is null/empty true)

10000

5,91 min

7,1 min

303273ff-53aa-48e2-965c-adaa1275f089,

cs00000001_0009

Query: (items.item_level_call_number is null/empty true) AND (mtypes.name in [3 Dimensional Object, Accessories, 2 Dimensional Graphic])

887472

2,2 min

3,76 min

12237a75-227c-4168-9d1c-b702dbf0d2bd,

cs00000001_0010

Query: (instance.instance_type_name in [cartographic dataset, cartographic image, cartographic tactile image])

31000

34 sec

42 sec

f473462f-002e-42d3-93b2-6e122257428b,

cs00000001_0010

 (instance.updated_at < 01/07/2026) AND (instance.title contains Charles)

104113

28 sec

12 ms(failed)

Service CPU utilization

Because all test iterations used the same JMeter script, the resulting CPU utilization profiles were highly consistent across runs. For clarity and to avoid redundancy, only the Service CPU graph from the final test execution is included below as a representative sample.
Test 3.

image-20260407-103440.png

Test 4

image-20260407-120758.png

 

 

Service memory utilization

image-20260407-103608.png

Database resource utilization