Skip to end of banner
Go to start of banner

<Draft>Data Import on Aurora Serverless

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 46 Next »

Overview

This page is created to investigate Aurora serverless performance by comparing it with configured database for DB xlarge instance type. Data Import (DI) on oasl-pvt cluster with Check-in Check-out (CICO) script running as background. 

Summary

  • The environment can handle the load with all compared DB instance types. 
  • Serverless v2 (32 - 128 ACUs) DB instance type configuration performs much better than (0.5 - 128 ACUs) due to increased capacity and its performance closer to 8xlarge. But to cut costs it's better to use (0.5 - 128 ACUs) for DB reader instance role. 
  • Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.
  • A set of tests were carried out in accordance with the conditions described in PERF-578 - Getting issue details... STATUS . Comparing tests with database instance type db.r6g.xlarge where RDS CPU utilization was stable but high with max rate of 96% against the database instance type Serverless v2 where RDS CPU utilization didn't exceed 26% in the highest load, yet the test duration decreased for 25k DI job only. Time decreasing happened because of Aurora Capacity Units (ACU) grew during first 10k data import and didn't scale down instantly. So ACUs stayed on the same level for some time and then scaled down to default level without load. 
  • In addition running tests for CICO PERF-593 - Getting issue details... STATUS I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.
  • To capture additional data from performance insights during DI with 50K file PERF-602 - Getting issue details... STATUS three DI operations for different DB instance types were carried out. All snapshots are located in Average Active Sessions table.
  • To check hypothesis that Aurora serverless configuration changes for Deployments and tasks parameter from 2 to 4 in mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage can lead to test duration decreasing additional DI without CICO were carried out. After check no dependencies are found for that amount of tests. It's considered that 5 or more DI operations needed to get relevant data.

Results

The table includes test results from running on different database instance types Here we observe that RDS CPU utilization for db.r6g.xlarge has maximum values and test duration grows proportional to file size.

But after database was switched to Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.

DI CICO Results

Create
Job profile: Default - Create instance and SRS MARC Bib

8xlarge

xlarge

db.r6g.xlarge

serverless

Serverless v2 (0.5 - 128 ACUs)

serverless

Serverless v2 (32 - 128 ACUs)



UsersFile - Records

Duration (CICO)

RDS max CPU utilizationDuration

RDS max CPU utilization

Duration

RDS max CPU utilization

Duration

ACUsDB connectionsRDS max CPU utilizationDuration
1DI
10k

37

27

00:05:15

00:03:21

9600:09:591700:10:07

1600:07:17



25k

45

30

00:10:04

00:08:08

9600:18:192400:13:43

2200:11:44



50k
3000:15:549300:37:052500:22:57

2400:20:01
2CICO + DI2010k90 min3900:04:329400:08:081900:09:12






25k
4700:09:019600:19:212600:14:30



3CICO DI Create2010k90 min




00:13:22






25k





00:23:49




CICO DI Update2010k60 min



1200:17:4412440




25k




1200:31:3513440

CPU Utilization DI and DI+CICO


8xlargexlargeserverless
RDS

CPU starts with spikes at the beginning of the tests and comes to normal after finish.


Test date: 2023-05-25

For xlarge database instance type CPU was maximum but it didn't affect DI any way. So it ran successfully 

Test date: 2023-05-29

For serverless CPU was stable and was not higher than 25%


Test date: 2023-05-30

Service

Data imports during CICO. The services worked stable and returned to there normal state after tests

CICO background process didn't affect DI and it worked as expected


Stable work of services






CICO Results

Additional set of tests in accordance with PERF-593 - Getting issue details... STATUS

Testing results for CICO

Test date: 2023-06-02

LG: us-west-2a

db.r6g.xlarge


db.r6g.8xlarge


Serverless v2 (0.5 - 128 ACUs)



Serverless v2 (32 - 128 ACUs)





Users

Duration (CICO)

RDS max CPU utilizationDB connections

RDS max CPU utilization

DB connections

RDS max CPU utilization

ACUs

DB connectionsRDS max CPU utilizationACUsDB connections
1CICO830 min1646023642.5

7.5

3801.532380


2030 min214302.53784.76.2396232380

CICO


db.r6g.xlarge

db.r6g.8xlargeServerless v2 (0.5 - 128 ACUs)Serverless v2 (32 - 128 ACUs)
Response Times Over Time

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Throughput

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

RDS CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Service CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Summary table for CICO



8 users
20 users

Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
db.r6g.xlargeCheck-In Controller02.8783.1142.7852.16
02.8893.1162.7842.118
Check-Out Controller9.1734.1034.5263.9483.212
13.7864.0614.4223.8623.079
db.r6g.8xlargeCheck-In Controller02.9463.2032.8492.17
02.9143.1212.8052.107
Check-Out Controller10.4194.1784.5653.9733.239
13.6834.0754.4343.8753.112
Serverless v2 (0.5 - 128 ACUs)Check-In Controller03.0883.3722.992.361
02.9713.2142.862.24
Check-Out Controller9.2554.4654.8624.2683.453
13.0994.2364.6964.0393.291
Serverless v2 (32 - 128 ACUs)Check-In Controller02.9723.2382.862.212
02.9333.1492.8252.135
Check-Out Controller10.5454.1914.6523.9983.274
13.4774.1064.5253.9153.174


Compare table for response times during 10k and 25k Data Import

Response times getting better for bigger files during DI. Delta shows difference in %.


10k DI25k DI

db.r6g.xlarge
Serverlessdelta, 75%delta, 95%db.r6g.xlarge
Serverlessdelta, 75%delta, 95%
Requests75th pct95th pctAverage
75th pct95th pctAverage

75th pct95th pctAverage
75th pct95th pctAverage

Check-In Controller3.2183.713.138
3.3473.8673.118-4.01-4.233.2493.6653.076
3.1343.3982.993.547.29
Check-Out Controller4.9896.3614.834
5.0065.9864.602-0.345.905.2466.2984.666
4.7195.194.33310.0517.59

Average Active Sessions for DI with 50k file


Serverless v2 (0.5 - 128 ACUs)db.r6g.8xlargedb.r6g.xlarge

Example of growing ACUs for data import 

Aurora Capacity Units

serverless

Test date: 2023-05-31

ACUs grow in accordance with load and scale down without it gradually

Response times for all DB configurations

  • Error rate correlates with DI file size - it grows with bigger files. The lowest error rate was with Serverless during 25 DI. All errors are in Check-Out Controller for POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 

db.r6g.8xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
Check-In Controller2.9013.1032.792
2.8663.1282.772
2.9363.2322.827
2.9333.1382.815
2.8933.0642.764
Check-Out Controller4.2554.7673.956
4.2124.64.017
4.3334.7284.088
4.3524.7874.065
4.2594.7313.902

db.r6g.xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller03.0533.4722.9422.506
02.9043.22.8372.199
03.2183.713.1382.726
03.2493.673.0762.672
02.9523.172.8562.242
Check-Out Controller43.3794.6565.8244.2844.343
9.1884.3224.94.2053.474
16.0614.9896.364.8344.914
36.6915.2466.34.6664.841
67.3694.2714.833.9353.427

Serverless

Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller02.9923.3152.8882.33
03.3473.8673.1182.854
03.1343.3982.992.45
02.9613.1642.852.237
Check-Out Controller13.7534.3824.9234.1763.481
15.4595.0065.9864.6024.506
27.4534.7195.194.3333.786
61.164.3514.8923.9843.461

CICO DI UPDATE 

Response times during 10k and 25k Data Import for Create and Update Serverless v2 (0.5 - 128 ACUs)


Update

Response Times Over Time

Throughput

RDS CPU utilization

Service CPU utilization

ACUs


Serverless

 Before 10k
During 10k
During 25k
Requests% KO75th pct95th pctAverageLatency_avg
% KO75th pct95th pctAverageLatency_avg
% KO75th pct95th pctAverageLatency_avg
Check-In Controller02.9413.1152.8231.84
03.0413.2672.9251.95
03.0663.2662.9311.966
Check-Out Controller10.394.1824.5073.9892.69
16.0684.3374.7454.0872.795
48.5714.4384.9134.0292.751

Links to Grafana

Test date: 2023-05-25 - 2023-05-31

Baseline xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685351205171&to=1685356817553

Baseline 8xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685025858811&to=1685033029740

Aurora Serverless

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685436750832&to=1685442470092


Test date: 2023-06-02 - 2023-06-06

db.r6g.xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685692747425&to=1685694623603

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685695312772&to=1685697366883

db.r6g.8xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685700612764&to=1685702445076

20 users 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685702803775&to=1685704908814

Serverless v2 (0.5 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686043433681&to=1686045340051

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686045911070&to=1686048158943

Serverless v2 (32 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685710370012&to=1685712636325

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685713535200&to=1685715506600


Test date: 2023-06-09 - 2023-06-12

Serverless v2 (0.5 - 128 ACUs) Update DI

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686564502053&to=1686568701735

Configuration

DI

Version of modules:
Source Record Manager Module (mod-source-record-manager-3.6.2)
Source Record Storage Module (mod-source-record-storage-5.6.5)
Inventory Module (mod-inventory-20.0.4)
Inventory Storage Module (mod-inventory-storage-26.0.0)
Inventory Update Module (mod-inventory-update-3.0.1)
Data Import Module (mod-data-import-2.7.1)
quickMARC (mod-quick-marc-3.0.0)

CICO

Version of modules:

Okapi (okapi-5.0.1)

users (mod-users-19.1.1)

Remote storage API module (mod-remote-storage-2.0.2)

Pubsub (mod-pubsub-2.9.1)

Patron Blocks Module (mod-patron-blocks-1.8.0)

Inventory Storage Module (mod-inventory-storage-26.0.0)

Inventory Module (mod-inventory-20.0.4)

feesfines (mod-feesfines-18.2.1)

Configuration (mod-configuration-5.9.1)

Circulation Storage Module (mod-circulation-storage-16.0.0)

Circulation Module (mod-circulation-23.5.4)

authtoken (mod-authtoken-2.13.0)

Environment:

  • UI endpoint: https://aurora-serverless-test.int.aws.folio.org/
  • Okapi endpoint: https://okapi-aurora-serverless-test.int.aws.folio.org/
  • Environment is configured to use shared MSK and ES
  • Created in INT account us-west-2 region, cluster name oasl, created with snapshot of Cornell Test environment.

    Modules versions: Orchid-GA.3
    Task count: HA – okapi x3, mod-data-import, mod-data-export, mod-quick-marc, mod-data-export-spring x1, all other modules x2
    OpenSearch: fse - shared domain (6 r6g.large.search datanodes)
    MSK: dedicated cluster - total 4 brokers (kafka.m5.large)
    RDS Configuration 1: db.r6g.8xlarge instance, Aurora PostgreSQL 13.9
    RDS Configuration 2: db.r6g.xlarge instance, Aurora PostgreSQL 13.9 
    RDS Configuration 3: Aurora Serverless, min ACU: 0.5, max ACU: 128 
    RDS Configuration 4: Aurora Serverless, min ACU: 32, max ACU: 128

  • No labels