Table of Contents outline true
...
- All tests were run successfully and without errors on the Mobius-performance testing cluster (mcpt). Approximately over 100 tests with various environment configurations were conducted. The tests and configurations that showed the best results for two database modifications, such as db.r6g.4xlarge and db.r6g.8xlarge, will be demonstrated.
- During testing, an insufficient amount of resources for Amazon OpenSearch Service was detected, so data node instance type was increased to r6g.2xlarge.search.
To improve performance, the AWS task count was adjusted for certain services, which increased the memory load on the database. To avoid OOM (Out of Memory) errors, configurations were changed in both the DB instance parameter group and the DB cluster parameter group. Specifically, the shared_buffers parameter was modified to SUM({DBInstanceClassMemory/24076},-50003).
- To avoid "remaining connection slots are reserved for non-replication superuser and rds_superuser connections" errors, configurations were changed in both the DB instance parameter group and the DB cluster parameter group. max_connections was modified to LEAST({DBInstanceClassMemory/9531392},9000)
- For mod-authtoken CPU was changed from 512 to 1024 and task count was adjusted to 6 tasks (2 default), okapi and nginx-okapi CPU was changed from 1024 to1536.
No sign of memory leaks on every module during master-script tests. Memory shows a stable trend. For tests with a high load of single record create and update (62 virtual users without pauses) OOM issue of mod-inventory was reported
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODINV-1023 - No modules are showing a growing CPU trend or any spikes.
Average response time differs and depends on environment configurations.
- The "Receiving: downloading" workflow. Due to its design, it's processed by the frontend and sends frequent requests to the database in a very large quantity. When under a load of 70% of tenants simultaneously, this causes significant degradation in the performance of all other workflows. Therefore, results are provided in tables with additional tests conducted with this workflow and without it. The CPU usage by the database increases by more than 25% with this workflow and reaches 99%.
- Database connection count did not exceed 5700
- During the high-load test, exporting 800,000 records concurrently for 62 tenants, an issue with the mod-data-export occurs
, which will be resolved and tested in the next release - QuesneliaJira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-927 Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-888 - Test with Fiscal year rollover for 60 tenants shows severe performance degradation for data import and data export workflows due to high database CPU usage 99.8%
Recommendations
Database parameter group configuration:
parameter | changed value | default value | Comment |
---|---|---|---|
shared_buffers | SUM({DBInstanceClassMemory/24076},-50003) | SUM({DBInstanceClassMemory/12038},-50003) | Increases freeable db memory from 25% to 50% and decreases cache memory limit from 75% to 50% |
max_connections | LEAST({DBInstanceClassMemory/9531392},9000) | LEAST({DBInstanceClassMemory/9531392},5000) | Increases maximal connection limit from 5000 to 9000 connections |
Amazon OpenSearch Service
data node instance type - r6g.2xlarge.search
Services
mod-authtoken CPU - 1024, 6 tasks
okapi CPU-1536
nginx-okapi CPU-1536
For database size db.r6g.4xlarge
adjust task count to 6 for services:
- mod-authtoken
adjust task count to 4 for services:
- mod-permissions
- mod-search
- mod-patron
- mod-inventory
- mod-inventory-storage
- mod-circulation
- mod-circulation-storage
- mod-order
- mod-order-storage,
- mod-invoice
- mod-invoice-storage
- mod-users
For database size db.r6g.8xlarge
adjust task count to 6 for services:
- mod-authtoken
- mod-search
adjust task count to 4 for services:
- mod-permissions
- mod-patron
- mod-inventory
- mod-inventory-storage
- mod-circulation
- mod-circulation-storage
- mod-order
- mod-order-storage,
- mod-invoice
- mod-invoice-storage
- mod-organization
- mod-organization-storage
- mod-users
- mod-finance
- mod-finance-storage
- mod-configuration
"Receiving: downloading" workflow optimization
Test Runs
Test # | Workflows | Test Conditions | Results |
---|---|---|---|
1. | All workflows | 2 tasks for all of the modules (except the modules with 1 task by requirements) database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, | Completed |
2. | All workflows | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users and task count 6 for mod-authtoken. Database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed |
3. | Without ERW: Exporting Receiving Information to CSV | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users and task count 6 for mod-authtoken. Database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed |
4. | All workflows | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users, mod-organization, mod-organization-storage, mod-finance, mod-finance-storage, mod-configuration and task count 6 for mod-authtoken, mod-search. Database r6g.8xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed |
5. | Without ERW: Exporting Receiving Information to CSV | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users, mod-organization, mod-organization-storage, mod-finance, mod-finance-storage, mod-configuration and task count 6 for mod-authtoken, mod-search. Database r6g.8xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed |
6. | All workflows | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users and task count 6 for mod-authtoken. Database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed |
7. | All workflows, high-load | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users and task count 6 for mod-authtoken. Database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed with errors from data export |
8. | All no-acquisition workflows + Fiscal year rollover 30 tenants | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users and task count 6 for mod-authtoken. Database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed |
9. | All no-acquisition workflows + Fiscal year rollover 60 tenants | Adjasted task count to 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, mod-users and task count 6 for mod-authtoken. Database r6g.4xlarge, shared_buffers parameters SUM({DBInstanceClassMemory/24076},-50003), Amazon OpenSearch Service r6g.2хlarge.search, mod-authtoken CPU - 1024, okapi CPU-1536, nginx-okapi CPU-1536 | Completed with errors from data import |
...
Test №8 Fiscal year rollover (30 tenants)
On the tenant cs00000001_0001 was performed test without other workflows and only 1 fiscal year rollover, duration was 1:24:52.
FYR | 30 tenants |
---|
ordersRolloverStatus | overallRolloverStatus |
---|
financialRolloverStatu | budgetsClosingRolloverStatus |
---|
Average FYR time | 3:21:51 | ||||
cs00000001_0002 | 2:54:30 | Success | Success | Success | Success |
cs00000001_0003 |
3: |
37: |
00 | Success | Success | Success | Success |
cs00000001_ |
0004 |
2: |
53: |
22 | Success | Success | Success | Success |
cs00000001_0005 |
3: |
45: |
51 | Success | Success | Success | Success |
cs00000001_ |
0006 | 2: |
55: |
25 | Success | Success | Success | Success |
cs00000001_0007 | 0:00:00 * | Not Started | Not Started | Not Started | Not Started |
cs00000001_0008 | 3:36:40 | Success | Success | Success | Success |
cs00000001_0009 | 2:56:24 | Success | Success | Success | Success |
cs00000001_ |
0010 | 2: |
57: |
10 | Success | Success | Success | Success |
cs00000001_0011 |
3: |
35: |
20 | Success | Success | Success | Success |
cs00000001_ |
0012 |
3: |
37: |
20 |
Success |
Success |
Success | Success |
cs00000001_0013 |
3: |
34: |
23 | Success | Success | Success | Success |
cs00000001_ |
0014 | 3: |
48: |
25 | Success | Success | Success | Success |
cs00000001_0015 | 3:47:25 | Success | Success | Success | Success |
cs00000001_ |
0016 |
3: |
47: |
25 | Success | Success | Success | Success |
cs00000001_0017 | 3:36:27 | Success | Success | Success | Success |
cs00000001_ |
0018 |
3: |
01: |
12 | Success | Success | Success | Success |
cs00000001_0019 | 3:46:49 | Success | Success | Success | Success |
cs00000001_ |
0021 |
2: |
58: |
25 | Success | Success | Success | Success |
cs00000001_0022 |
3: |
44:58 | Success | Success | Success | Success |
cs00000001_ |
0023 | 3: |
00: |
47 | Success | Success | Success | Success |
cs00000001_0024 | 2:58:20 | Success | Success | Success | Success |
cs00000001_ |
0025 | 3: |
45: |
31 | Success | Success | Success | Success |
cs00000001_0026 | 2:59:31 | Success | Success | Success | Success |
cs00000001_ |
0027 | 3: |
44: |
23 | Success | Success | Success | Success |
cs00000001_0028 |
3: |
33:36 | Success | Success | Success | Success |
cs00000001_ |
0029 |
2: |
59: |
34 | Success | Success | Success | Success |
cs00000001_0030 |
2: |
57: |
58 | Success | Success | Success | Success |
cs00000001_ |
0031 | 3: |
32: |
05 | Success | Success | Success |
Success |
cs00000001_ |
0058 | 3: |
07: |
29 | Success | Success | Success | Success |
Test №9 Fiscal year rollover (60 tenants)
...
...
* - for tenant cs00000001_0007 FYR was not started properly(mechanical mistake due to manual start). Database was restored from snapshot for the next test with 60 tenants and finished successful for all tenants.
FYR 30 | errors | |
CICO_TC_Check-In Controller | 2254 | 0 |
CICO_TC_Check-Out Controller | 4004 | 0 |
CSI_TC:Share local instance | 15921 | 0 |
DE_Exporting MARC Bib records custom workflow | 49195 | 0 |
DE_Exporting MARC Bib records workflow | 55158 | 0 |
EVA_TC: View Account | 451 | 0.02% |
ILR_TC: Create ILR | 1473 | 0 |
MSF_TC: mod search by auth query | 498 | 0 |
MSF_TC: mod search by boolean query | 150 | 0 |
MSF_TC: mod search by contributors | 366 | 0 |
MSF_TC: mod search by filter query | 262 | 0 |
MSF_TC: mod search by keyword query | 261 | 0 |
MSF_TC: mod search by subject query | 425 | 0 |
MSF_TC: mod search by title query | 1027 | 0 |
OPIH_/oai/records | 1387 | 0 |
RTAC_TC: edge-rtac | 2176 | 0 |
SDIC_Single Record Import (Create) | 21684 | 0 |
SDIU_Single Record Import (Update) | 45174 | 0 |
ULR_TC: Users loan Renewal Transaction | 3078 | 0 |
TOTAL | 309 | 0 |
Test №9 Fiscal year rollover (60 tenants)
FYR | 60 tenants | ordersRolloverStatus | overallRolloverStatus | financialRolloverStatus | budgetsClosingRolloverStatus | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Average FYR time | 6:26:33 | |||||||||||||||||||||
cs00000001_0002 | 6:20:54 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0003 | 6:28:36 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00160004 | 35:4716:2545 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0005 | 6:4909:2305 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0017 | 3:36:27 | Success | Success | Success | Success0006 | 6:5130:4916 | Success | Success | Success | Success | ||||||||||||
cs00000001_00180007 | 35:0119:1239 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0008 | 6:3114:2802 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00190009 | 36:4632:4953 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0010 | 6:2916:4517 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0020 | - | - | - | - | - | - | - | - | - | - | cs00000001_0021 | 2:58:25 | Success | Success | Success | Success | 5:52:250011 | 6:25:58 | Success | Success | Success | Success |
cs00000001_00220012 | 36:4448:5833 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0013 | 6:4947:0437 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00230014 | 36:0049:4736 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0015 | 6:5046:4431 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00240016 | 26:5849:2023 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0017 | 56:51:4449 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00250018 | 36:4531:3128 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0019 | 6:29:45 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00260021 | 25:5952:3125 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0022 | 6:5449:5304 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00270023 | 36:50:44:23 | Success | Success | Success | Success | 6:30:05|||||||||||||||||
cs00000001_0024 | 5:51:44 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00280025 | 35:3351:3654 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0026 | 6:54:0653 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00290027 | 26:5930:3405 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0028 | 6:54:5706 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_00300029 | 26:54:57:58 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0030 | 6:29:03 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0031 | 3:32:05 | Success | Success | Success | Success | 5:52:42 | Success | Success | Success | Success | ||||||||||||
cs00000001_0032 | 6:46:42 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0033 | 6:46:17 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0034 | 6:53:40 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0035 | 6:27:10 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0036 | 6:52:56 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0037 | 5:53:08 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0038 | 6:51:47 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0039 | 6:43:38 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0040 | 6:50:57 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0041 | 6:50:25 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0042 | 5:51:07 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0043 | 6:49:32 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0044 | 6:24:14 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0045 | 5:50:48 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0046 | 6:24:16 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0047 | 6:48:19 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0048 | 5:49:26 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0049 | 5:49:01 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0050 | 6:39:57 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0051 | 6:22:53 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0052 | 6:21:09 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0053 | 6:37:53 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0054 | 6:37:42 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0055 | 6:21:08 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0056 | 6:36:46 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0057 | 6:19:53 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0058 | 3:07:29 | Success | Success | Success | Success | 6:43:31 | Success | Success | Success | Success | ||||||||||||
cs00000001_0059 | 6:35:42 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0060 | 6:42:28 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0061 | 5:44:17 | Success | Success | Success | Success | |||||||||||||||||
cs00000001_0062 | 5:45:11 | Success | Success | Success | Success |
...
PTF - environment mcpt
- 14 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
one of
Name Memory GIB vCPUs db.r6g.4xlarge
128 GiB 16 vCPUs db.r6g.8xlarge 256 GiB 32 vCPUs - Amazon OpenSearch Service: ptf-test
- Data nodes
Instance type-r6g.2xlarge.search
Number of nodes-4
- Dedicated master nodes
Instance type-r6g.large.search
Number of nodes-3
- Data nodes
- MSK ptf-mobius-testing2
- 2 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
...
Modules memory and CPU parameters:
db.r6g.4xlarge
...
MOBIUS Tests: scenarios were started by carrier.io or from load generator instance in the same (us-east-1) AWS region.
The database contains five times more generated data for large tenant IDs compared to small tenant IDs. The table below displays the average count of generated instances and items for both small and large tenants..
tenant | instances | items |
---|---|---|
Small | 84754 | 103137 |
Large | 416176 | 543428 |
The attached file contains full info about generated data per tenant.
large tenants IDs = 0042, 0024, 0051, 0006
...
At the initial stage of testing, there were numerous errors from various workflows, most notably those involving search. While investigating the cause of the errors, it was found that there was insufficient CPU capacity (100% utilization) on the data nodes of the OpenSearch cluster. Initially, testing was done with r6g.large.search, then increased to r6g.4xlarge.search, but it was too large and only utilized 40% CPU. Therefore, it was downsized to r6g.2xlarge.search, with the number of containers (tasks) increased to 4 for mod-search.
However, after identifying the issue related to search, a problem with the database persisted, such as OOM (Out of Memory). Two options were tested to address this issue: increasing the database size to db.r6g.8xlarge and adjusting the shared_buffers configuration to SUM({DBInstanceClassMemory/24076},-50003) for the size of db.r6g.4xlarge. Both tests showed promising results; however, in both cases, there were errors related to an insufficient number of connections to the database in the logs ("remaining connection slots are reserved for non-replication superuser and rds_superuser connections"). To address this, the max_connections configuration was set to LEAST({DBInstanceClassMemory/9531392},9000) for both sizes of the database, increasing the upper limit of connections to the database to 9000.
All of these measures helped to eliminate errors, but response times remained high, especially for check-in and check-out. To improve response time, we increased the number of containers for mod-authtoken to 4, which yielded very good results. Additionally, we improved response times for all circulation workflows by setting 4 containers for each: mod-permissions, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage