Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview

This page is created to investigate Aurora serverless performance by comparing DB xlarge, 8xlarge and Aurora serverless instance types under load running Data Import (DI) with Check-in Check-out (CICO) running as background. 

Summary

  • The environment can handle the load with all compared DB instance types. 
  • No significant changes were observed comparing response times for CICO between two instance types db.r6g.xlarge and serverless. 
  • Aurora serverless performs even better for bigger files.


  • Serverless v2 (32 - 128 ACUs) DB instance type configuration performs much better than (0.5 - 128 ACUs) due to increased capacity and its performance closer to 8xlarge. But to cut costs it's better to use (0.5 - 128 ACUs) for DB reader instance role. 
  • Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.
  • To check hypothesis that Aurora serverless configuration changes for Deployments and tasks parameter from 2 to 4 in mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage can lead to test duration decreasing additional DI without CICO were carried out. After check no dependencies are found for that amount of tests. It's considered that 5 or more DI operations needed to get relevant data.

Results

The table includes test results from running on different database instance types Here we observe that RDS CPU utilization for db.r6g.xlarge has maximum values and test duration grows proportional to file size.

But after database was switched to Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.

DI CICO

Results

Total results

Create
Job profile: Default - Create instance and SRS MARC Bib

8xlarge

xlarge

db.r6g.xlarge

serverless

Serverless v2 (0.5 - 128 ACUs)

serverless

Serverless v2 (32 - 128 ACUs)



UsersFile - Records

Duration (CICO)

RDS max CPU utilizationDuration

RDS max CPU utilization

Duration

RDS max CPU utilization

Duration

ACUsRDS max CPU utilizationDuration
1DI
10k

37

27

00:05:15

00:03:21

9600:09:591700:10:07
1600:07:17



25k

45

30

00:10:04

00:08:08

9600:18:192400:13:43
2200:11:44



50k
3000:15:549300:37:052500:22:57
2400:20:01
2CICO + DI2010k90 min3900:04:329400:08:081900:09:12





25k
4700:09:019600:19:212600:14:30


3

CICO DI Create

JP: PTF - Create 2

2010k90 min

9400:09:561400:13:2219




25k


9400:21:062400:23:4925


CICO DI Update

JP: PTF - Updates Success - 1

2010k90 min

7000:12:311200:17:4412




25k


7000:29:121200:31:3513


RDS CPU Utilization

DI and DI+CICO


8xlargexlargeserverless
RDS

CPU starts with spikes at the beginning of the tests and comes to normal after finish.


Test date: 2023-05-25

For xlarge database instance type CPU was maximum but it didn't affect DI any way. So it ran successfully 

Test date: 2023-05-29

For serverless CPU was stable and was not higher than 25%


Test date: 2023-05-30

Service

Data imports during CICO. The services worked stable and returned to there normal state after tests

CICO background process didn't affect DI and it worked as expected


Stable work of services



CICO Results

Running tests for CICO

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-593
I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.

Testing results for CICO

Test date: 2023-06-02

LG: us-west-2a

db.r6g.xlarge


db.r6g.8xlarge


Serverless v2 (0.5 - 128 ACUs)



Serverless v2 (32 - 128 ACUs)





Users

Duration (CICO)

RDS max CPU utilizationDB connections

RDS max CPU utilization

DB connections

RDS max CPU utilization

ACUs

DB connectionsRDS max CPU utilizationACUsDB connections
1CICO830 min1646023642.5

7.5

3801.532380


2030 min214302.53784.76.2396232380

CICO Graphs


db.r6g.xlarge

db.r6g.8xlargeServerless v2 (0.5 - 128 ACUs)Serverless v2 (32 - 128 ACUs)
Response Times Over Time

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Throughput

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

RDS CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Service CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Summary table for CICO



8 users
20 users

Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
db.r6g.xlargeCheck-In Controller02.8783.1142.7852.16
02.8893.1162.7842.118
Check-Out Controller9.1734.1034.5263.9483.212
13.7864.0614.4223.8623.079
db.r6g.8xlargeCheck-In Controller02.9463.2032.8492.17
02.9143.1212.8052.107
Check-Out Controller10.4194.1784.5653.9733.239
13.6834.0754.4343.8753.112
Serverless v2 (0.5 - 128 ACUs)Check-In Controller03.0883.3722.992.361
02.9713.2142.862.24
Check-Out Controller9.2554.4654.8624.2683.453
13.0994.2364.6964.0393.291
Serverless v2 (32 - 128 ACUs)Check-In Controller02.9723.2382.862.212
02.9333.1492.8252.135
Check-Out Controller10.5454.1914.6523.9983.274
13.4774.1064.5253.9153.174



Compare

Comparison table for response times during 10k and 25k Data Import

Response times getting better for bigger files during DI. Delta shows difference in %.


10k DI25k DI

db.r6g.xlarge
Serverlessdelta, 75%delta, 95%db.r6g.xlarge
Serverlessdelta, 75%delta, 95%
Requests75th pct95th pctAverage
75th pct95th pctAverage

75th pct95th pctAverage
75th pct95th pctAverage

Check-In Controller3.2183.713.138
3.3473.8673.118-4.01-4.233.2493.6653.076
3.1343.3982.993.547.29
Check-Out Controller4.9896.3614.834
5.0065.9864.602-0.345.905.2466.2984.666
4.7195.194.33310.0517.59


Average Active Sessions for DI with 50k file

To capture additional data from performance insights during DI with 50K file

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-602
three DI operations for different DB instance types were carried out.

Serverless v2 (0.5 - 128 ACUs)db.r6g.8xlargedb.r6g.xlarge


Example of growing ACUs for data import 

Aurora Capacity Units

serverless

Test date: 2023-05-31

ACUs grow in accordance with load and scale down without it gradually


Response times for all DB configurations

Error rate correlates with DI file size - it grows with bigger files. The lowest error rate was with Serverlessduring 25 DI. All errors are in Check-Out Controller for POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 

db.r6g.8xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
Check-In Controller2.9013.1032.792
2.8663.1282.772
2.9363.2322.827
2.9333.1382.815
2.8933.0642.764
Check-Out Controller4.2554.7673.956
4.2124.64.017
4.3334.7284.088
4.3524.7874.065
4.2594.7313.902


db.r6g.xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller03.0533.4722.9422.506
02.9043.22.8372.199
03.2183.713.1382.726
03.2493.673.0762.672
02.9523.172.8562.242
Check-Out Controller43.3794.6565.8244.2844.343
9.1884.3224.94.2053.474
16.0614.9896.364.8344.914
36.6915.2466.34.6664.841
67.3694.2714.833.9353.427


Serverless

Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller02.9923.3152.8882.33
03.3473.8673.1182.854
03.1343.3982.992.45
02.9613.1642.852.237
Check-Out Controller13.7534.3824.9234.1763.481
15.4595.0065.9864.6024.506
27.4534.7195.194.3333.786
61.164.3514.8923.9843.461

* '*' Due to high error rate it was decided to retest CICO DI with new job profiles for Create and Update (PTF - Create 2, PTF - Updates Success - 1).

CICO DI Create + Update


Serverless

db.r6g.xlarge    
Response Times Over Time

Create

Update

Create

Update

RDS CPU utilization

Service CPU utilization

ACUs


CICO response times

For Aurora serverless it was observed response time growth instantly after DI start with smooth decreasing while executing (PTF - Create 2 job profile). 

For xlarge DB instance type CPU utilization during CICO stayed stable on level of 15% and after DI with 10k file rapidly go to 93% and stay on this level during all process of DI. 

Serverless v2 (0.5 - 128 ACUs)

 Before 10k
During 10k
During 25k

Requests75th pct95th pctAverageLatency_avg
75th pct95th pctAverageLatency_avg
75th pct95th pctAverageLatency_avg
CreateCheck-In Controller2.9283.1712.8551.851
3.3913.9993.2482.242
3.1563.4273.062.07
Check-Out Controller4.1985.0124.1062.788
4.825.6724.6423.311
4.534.934.4073.085
UpdateCheck-In Controller2.933.092.8071.823
2.9663.1522.8821.883
3.0483.2562.9511.948
Check-Out Controller4.1764.974.1522.841
4.234.464.1342.823
4.425.0124.3272.997

db.r6g.xlarge    















CreateCheck-In Controller2.7642.8672.7861.788
3.2043.4613.0772.08
3.3183.6063.1762.178
Check-Out Controller4.024.1554.0452.74
4.6284.9764.4663.148
4.8615.1814.6723.341
UpdateCheck-In Controller2.8163.0782.741.757
2.8252.9282.8371.848
2.8532.9522.8681.873
Check-Out Controller4.064.2523.9432.632
4.0774.2024.0972.78
4.1264.2434.1542.839


Appendix

Links to Grafana

Test date: 2023-05-25 - 2023-05-31

Baseline xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685351205171&to=1685356817553

Baseline 8xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685025858811&to=1685033029740

Aurora Serverless

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685436750832&to=1685442470092


Test date: 2023-06-02 - 2023-06-06

db.r6g.xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685692747425&to=1685694623603

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685695312772&to=1685697366883

db.r6g.8xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685700612764&to=1685702445076

20 users 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685702803775&to=1685704908814

Serverless v2 (0.5 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686043433681&to=1686045340051

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686045911070&to=1686048158943

Serverless v2 (32 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685710370012&to=1685712636325

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685713535200&to=1685715506600


Test date: 2023-06-13

Serverless v2 (0.5 - 128 ACUs) CICO DI Create + Update

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&from=1686673259569&to=1686675910907&var-percentile=95&var-test_type=baseline&var-test=oasl_fixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

db.r6g.xlarge 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&from=1686746379062&to=1686758375536&var-percentile=95&var-test_type=baseline&var-test=oasl_fixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

Configuration

DI

Version of modules:
Source Record Manager Module (mod-source-record-manager-3.6.2)
Source Record Storage Module (mod-source-record-storage-5.6.5)
Inventory Module (mod-inventory-20.0.4)
Inventory Storage Module (mod-inventory-storage-26.0.0)
Inventory Update Module (mod-inventory-update-3.0.1)
Data Import Module (mod-data-import-2.7.1)
quickMARC (mod-quick-marc-3.0.0)


CICO

Version of modules:

Okapi (okapi-5.0.1)

users (mod-users-19.1.1)

Remote storage API module (mod-remote-storage-2.0.2)

Pubsub (mod-pubsub-2.9.1)

Patron Blocks Module (mod-patron-blocks-1.8.0)

Inventory Storage Module (mod-inventory-storage-26.0.0)

Inventory Module (mod-inventory-20.0.4)

feesfines (mod-feesfines-18.2.1)

Configuration (mod-configuration-5.9.1)

Circulation Storage Module (mod-circulation-storage-16.0.0)

Circulation Module (mod-circulation-23.5.4)

authtoken (mod-authtoken-2.13.0)


Environment

  • UI endpoint: https://aurora-serverless-test.int.aws.folio.org/
  • Okapi endpoint: https://okapi-aurora-serverless-test.int.aws.folio.org/
  • Environment is configured to use shared MSK and ES
  • Created in INT account us-west-2 region, cluster name oasl, created with snapshot of Cornell Test environment.

    Modules versions: Orchid-GA.3
    Task count: HA – okapi x3, mod-data-import, mod-data-export, mod-quick-marc, mod-data-export-spring x1, all other modules x2
    OpenSearch: fse - shared domain (6 r6g.large.search datanodes)
    MSK: dedicated cluster - total 4 brokers (kafka.m5.large)
    RDS Configuration 1: db.r6g.8xlarge instance, Aurora PostgreSQL 13.9
    RDS Configuration 2: db.r6g.xlarge instance, Aurora PostgreSQL 13.9 
    RDS Configuration 3: Aurora Serverless, min ACU: 0.5, max ACU: 128 
    RDS Configuration 4: Aurora Serverless, min ACU: 32, max ACU: 128