Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

This page is created to investigate

Table of Contents
Overview

This page is created to investigate Aurora serverless performance by comparing it with configured database for DB xlarge instance type. , 8xlarge and Aurora serverless instance types under load running Data Import (DI) on oasl-pvt cluster with Check-in Check-out (CICO) script running as background. 

Summary

50k3000:15:549300:37:052500:22:572400:20:012CICO + DI
  • The environment can handle the load with all compared DB instance types. 
  • No significant changes were observed comparing response times for CICO between two instance types db.r6g.xlarge and serverless. 
  • In Aurora serverless DI duration better for larger DI files.
  • Serverless v2 (32 - 128 ACUs) DB instance type configuration performs much better from the start than (0.5 - 128 ACUs) due to increased capacity and its performance closer to 8xlarge. But to cut costs it's better to use (0.5 - 128 ACUs) for DB reader instance role. 
  • Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.
  • A set of tests were carried out in accordance with the conditions described in
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-578
    . Comparing tests with database instance type db.r6g.xlarge where RDS CPU utilization was stable but high with max rate of 96% against the database instance type Serverless v2 where RDS CPU utilization didn't exceed 26% in the highest load, yet the test duration decreased for 25k DI job only. Time decreasing happened because of Aurora Capacity Units (ACU) grew during first 10k data import and didn't scale down instantly. So ACUs stayed on the same level for some time and then scaled down to default level without load. 
  • In addition running tests for CICO
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-593
    I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.
  • To capture additional data from performance insights during DI with 50K file
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-602
    three DI operations for different DB instance types were carried out. All snapshots are located in Average Active Sessions table.
  • To check hypothesis that Aurora serverless configuration changes for Deployments and tasks parameter from 2 to 4 in mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage can lead to test duration decreasing additional DI without CICO were carried out. After check no dependencies are found for that amount of tests. It's considered that 5 or more DI operations needed to get relevant data.

The table includes test results from running on different database instance types Here we observe that RDS CPU utilization for db.r6g.xlarge has maximum values and test duration grows proportional to file size.

But after database was switched to Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.

DI CICO Results

Create
Job profile: Default - Create instance and SRS MARC Bib

8xlarge

xlarge

db.r6g.xlarge

serverless

Serverless v2 (0.5 - 128 ACUs)

serverless

Serverless v2 (32 - 128 ACUs)

UsersFile - Records

Duration (CICO)

RDS max CPU utilizationDuration

RDS max CPU utilization

Duration

RDS max CPU utilization

Duration

RDS max CPU utilizationDuration1DI10k

37

27

00:05:15

00:03:21

9600:09:591700:10:071600:07:1725k

45

30

00:10:04

00:08:08

9600:18:192400:13:432200:11:44
  • Time duration of DI without CICO didn't change after task count: mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage x 4.

Results

The table includes test results from running on different database instance types Here we observe that RDS CPU utilization for db.r6g.xlarge has maximum values and test duration grows proportional to file size.

But after database was switched to Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.

DI CICO Total results

Create
Job profile: Default - Create instance and SRS MARC Bib

RDS

db.r6g.8xlarge

RDS

db.r6g.xlarge

Serverless

Serverless v2 (0.5 - 128 ACUs)

Serverless

Serverless v2 (32 - 128 ACUs)



UsersFile - Records

Duration (CICO)

Max CPU utilizationDuration

Max CPU utilization

Duration

Max CPU utilization

Duration

ACUsMax CPU utilizationDuration
1

DI Create
10k

37

27

00:05:15

00:03:21

9600:09:591700:10:07
16

00:07:17 ↓ 28%



25k

45

30

00:10:04

00:08:08

9600:18:192400:13:43
2200:11:44 ↓ 15%


50k
3000:15:549300:37:052500:22:57
2400:20:01 ↓ 11%
2CICO + DI Create2010k90 min3900:04:329400:08:081900:09:12




25k
4700:09:019600:19:212600:14:30


3


CICO DI Create

JP: PTF - Create 2

2010k90 min

9400:09:561400:13:2219



25k


9400:21:062400:23:4925

CICO DI Update

JP: PTF - Updates Success - 1

2010k90 min
39


7000:
04
12:
32
31
94
1200:
08
17:
081900:09:
4412



25k
47



7000:
09
29:
01
12
96
1200:
19
31:
21
35
2600:14:30CPU Utilization DI and DI+CICO
13


RDS CPU Utilization


8xlargexlargeserverless
RDS

CPU starts with spikes at the beginning of the tests and comes to normal after finish.


Test date: 2023-05-25

For xlarge database instance type CPU was maximum but it didn't affect DI any way. So it ran successfully 

Test date: 2023-05-29

For serverless CPU was stable and was not higher than 25%


Test date: 2023-05-30

Service

Data imports during CICO. The services worked stable and returned to there normal state after tests

CICO background process didn't affect DI and it worked as expected


Stable work of services



CICO

ResultsAdditional set of tests in accordance with

resource consumption

Running tests for CICO

Jira Legacy
serverSystem

JiraTest date: 2023-06-

JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-593

Testing results for CICO

I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.

Testing results for CICO

Test date: 2023-06-02

LG: us-west-2a

RDS (db.r6g.xlarge)


RDS (db.r6g.8xlarge)


Serverless v2 (0.5 - 128 ACUs)



Serverless v2 (32 - 128 ACUs)





Users

Duration (CICO)

RDS max CPU utilizationDB connections

RDS max CPU utilization

DB connections

RDS max CPU utilization

ACUs

DB connectionsRDS max CPU utilizationACUsDB connections
1CICO830 min1646023642.5

7.5

3801.532380


2030 min214302.53784.76.2396232380

CICO Graphs


db.r6g.xlarge

db.r6g.8xlargeServerless v2 (0.5 - 128 ACUs)Serverless v2 (32 - 128 ACUs)
Response Times Over Time

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Throughput

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

RDS CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Service CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Summary table for CICO



8 users
20 users

Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
db.r6g.xlargeCheck-In Controller02.8783.1142.7852.16
02.8893.1162.7842.118
Check-Out Controller9.1734.1034.5263.9483.212
13.7864.0614.4223.8623.079
db.r6g.8xlargeCheck-In Controller02.9463.2032.8492.17
02.9143.1212.8052.107
Check-Out Controller10.4194.1784.5653.9733.239
13.6834.0754.4343.8753.112
Serverless v2 (0.5 - 128 ACUs)Check-In Controller03.0883.3722.992.361
02.9713.2142.862.24
Check-Out Controller9.2554.4654.8624.2683.453
13.0994.2364.6964.0393.291
Serverless v2 (32 - 128 ACUs)Check-In Controller02.9723.2382.862.212
02.9333.1492.8252.135
Check-Out Controller10.5454.1914.6523.9983.274
13.4774.1064.5253.9153.174



Compare

Comparison table for response times during 10k and 25k Data Import

Response times getting better for bigger files during DI. Delta shows difference in %.


10k DI25k DI

RDS (db.r6g.xlarge)


Serverlessdelta, 75%delta, 95%RDS (db.r6g.xlarge)
Serverlessdelta, 75%delta, 95%
Requests75th pct95th pctAverage
75th pct95th pctAverage

75th pct95th pctAverage
75th pct95th pctAverage

Check-In Controller3.2183.713.138
3.3473.8673.118-4.01-4.233.2493.6653.076
3.1343.3982.993.547.29
Check-Out Controller4.9896.3614.834
5.0065.9864.602-0.345.905.2466.2984.666
4.7195.194.33310.0517.59


Average Active Sessions for DI with 50k file

Serverless v2 (0.5 - 128 ACUs)db.r6g.8xlargedb.r6g.xlarge

Image Removed

Image Removed

Image Removed

To capture additional data from performance insights during DI with 50K file

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-602
three DI operations for different DB instance types were carried out.

Serverless v2 (0.5 - 128 ACUs)RDS (db.r6g.8xlarge)db.r6g.xlarge

Image Added

Image Added

Image Added


Example of growing ACUs for data import 

Aurora Capacity Units

serverless

Test date: 2023-05-31

ACUs grow in accordance with load and scale down without it gradually


Response times for all DB configurations

Error rate correlates with DI file size - it grows with bigger files. The lowest error rate was with Serverlessduring 25 DI. All errors are in Check-Out Controller for POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 

RDS db.r6g.8xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
Check-In Controller2.9013.1032.792
2.8663.1282.772
2.9363.2322.827
2.9333.1382.815
2.8933.0642.764
Check-Out Controller4.2554.7673.956
4.2124.64.017
4.3334.7284.088
4.3524.7874.065
4.2594.7313.902


RDS

db.r6g.xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller03.0533.4722.9422.506
02.9043.22.8372.199
03.2183.713.1382.726
03.2493.673.0762.672
02.9523.172.8562.242
Check-Out Controller43.3794.6565.8244.2844.343
9.1884.3224.94.2053.474
16.0614.9896.364.8344.914
36.6915.2466.34.6664.841
67.3694.2714.833.9353.427


Baseline xlarge

Serverless

Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller02.9923.3152.8882.33
03.3473.8673.1182.854
03.1343.3982.992.45
02.9613.1642.852.237Check-Out Controller13.7534.3824.9234.1763.48115.4595.0065.9864.6024.50627.4534.7195.194.3333.78661.164.3514.8923.9843.461

Links to Grafana

Test date: 2023-05-25 - 2023-05-31

2.237
Check-Out Controller13.7534.3824.9234.1763.481
15.4595.0065.9864.6024.506
27.4534.7195.194.3333.786
61.164.3514.8923.9843.461


Due to high error rate a new set of CICO DI tests were carried out with new job profiles for Create and Update (PTF - Create 2, PTF - Updates Success - 1).

CICO DI Create + Update


Serverless

db.r6g.xlarge    
Response Times Over Time

Create

Image Added

Update

Image Added

Create

Image Added

Update

Image Added

RDS CPU utilization

Image Added

Image Added

Service CPU utilization

Image Added

Image Added

ACUs

Image Added


CICO response times

For Aurora serverless it was observed response time growth instantly after DI start with smooth decreasing while executing (PTF - Create 2 job profile). 

For xlarge DB instance type CPU utilization during CICO stayed stable on level of 15% and after DI with 10k file rapidly go to 93% and stay on this level during all process of DI. 

Serverless v2 (0.5 - 128 ACUs)

 Before 10k
During 10k
During 25k

Requests75th pct95th pctAverageLatency_avg
75th pct95th pctAverageLatency_avg
75th pct95th pctAverageLatency_avg
CreateCheck-In Controller2.9283.1712.8551.851
3.3913.9993.2482.242
3.1563.4273.062.07
Check-Out Controller4.1985.0124.1062.788
4.825.6724.6423.311
4.534.934.4073.085
UpdateCheck-In Controller2.933.092.8071.823
2.9663.1522.8821.883
3.0483.2562.9511.948
Check-Out Controller4.1764.974.1522.841
4.234.464.1342.823
4.425.0124.3272.997

RDS (db.r6g.xlarge)















CreateCheck-In Controller2.7642.8672.7861.788
3.2043.4613.0772.08
3.3183.6063.1762.178
Check-Out Controller4.024.1554.0452.74
4.6284.9764.4663.148
4.8615.1814.6723.341
UpdateCheck-In Controller2.8163.0782.741.757
2.8252.9282.8371.848
2.8532.9522.8681.873
Check-Out Controller4.064.2523.9432.632
4.0774.2024.0972.78
4.1264.2434.1542.839


Appendix

Folio release: Orchid

Resource usage: R/W split disabled for all modules

Links to Grafana

Test date: 2023-05-25 - 2023-05-31

Baseline xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685351205171&to=1685356817553

Baseline 8xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685025858811&to=1685033029740

Aurora Serverless

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=16853512051711685436750832&to=1685356817553Baseline 8xlarge1685442470092


Test date: 2023-06-02 - 2023-06-06

db.r6g.xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=

1685025858811

1685692747425&to=

1685033029740Aurora Serverless

1685694623603

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=

1685436750832

1685695312772&to=

1685442470092

Test date: 2023-06-02 - 2023-06-06

1685697366883

db.r6g.xlarge8xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=

1685692747425

1685700612764&to=

1685694623603

1685702445076

20

users:

users 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=

1685695312772

1685702803775&to=

1685697366883db.r6g.8xlarge

1685704908814

Serverless v2 (0.5 - 128 ACUs)

8 users:

 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=16857006127641686043433681&to=16857024450761686045340051

20 users users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=16857028037751686045911070&to=16857049088141686048158943

Serverless v2 (0.5 32 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=16860434336811685710370012&to=16860453400511685712636325

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686045911070&to=16860481589431685713535200&to=1685715506600


Test date: 2023-06-13

Serverless v2 (32 0.5 - 128 ACUs)

8 users:

CICO DI Create + Update

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-testwith-average-copylatency?orgId=1&from=1686673259569&to=1686675910907&var-percentile=95&var-test_type=baseline&var-test=circulationoasl_checkInCheckOut_orchidfixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685710370012&to=168571263632520 users:

db.r6g.xlarge 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-testwith-average-copylatency?orgId=1&from=1686746379062&to=1686758375536&var-percentile=95&var-test_type=baseline&var-test=circulationoasl_checkInCheckOut_orchidfixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685713535200&to=1685715506600

Configuration

DI

Version of modules:
Source Record Manager Module (mod-source-record-manager-3.6.2)
Source Record Storage Module (mod-source-record-storage-5.6.5)
Inventory Module (mod-inventory-20.0.4)
Inventory Storage Module (mod-inventory-storage-26.0.0)
Inventory Update Module (mod-inventory-update-3.0.1)
Data Import Module (mod-data-import-2.7.1)
quickMARC (mod-quick-marc-3.0.0)


CICO

Version of modules:

Okapi (okapi-5.0.1)

users (mod-users-19.1.1)

Remote storage API module (mod-remote-storage-2.0.2)

Pubsub (mod-pubsub-2.9.1)

Patron Blocks Module (mod-patron-blocks-1.8.0)

Inventory Storage Module (mod-inventory-storage-26.0.0)

Inventory Module (mod-inventory-20.0.4)

feesfines (mod-feesfines-18.2.1)

Configuration (mod-configuration-5.9.1)

Circulation Storage Module (mod-circulation-storage-16.0.0)

Circulation Module (mod-circulation-23.5.4)

authtoken (mod-authtoken-2.13.0)


Environment

:

  • UI endpoint: https://aurora-serverless-test.int.aws.folio.org/
  • Okapi endpoint: https://okapi-aurora-serverless-test.int.aws.folio.org/
  • Environment is configured to use shared MSK and ES
  • Created in INT account us-west-2 region, cluster name oasl, created with snapshot of Cornell Test environment.

    Modules versions: Orchid-GA.3
    Task count: HA – okapi x3, mod-data-import, mod-data-export, mod-quick-marc, mod-data-export-spring x1, all other modules x2
    OpenSearch: fse - shared domain (6 r6g.large.search datanodes)
    MSK: dedicated cluster - total 4 brokers (kafka.m5.large)
    RDS Configuration 1: db.r6g.8xlarge instance, Aurora PostgreSQL 13.9
    RDS Configuration 2: db.r6g.xlarge instance, Aurora PostgreSQL 13.9 
    RDS Configuration 3: Aurora Serverless, min ACU: 0.5, max ACU: 128 
    RDS Configuration 4: Aurora Serverless, min ACU: 32, max ACU: 128