Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview

This page is created to investigate Aurora serverless performance by comparing it with configured database for DB xlarge instance type. Data Import (DI) on oasl-pvt cluster with Check-in Check-out (CICO) script running as background. 

Summary

  • The environment can handle the load with all compared DB instance types. 
  • No significant changes were observed comparing response times for CICO between two instance types db.r6g.xlarge and serverless. 
  • Aurora serverless performs even better for bigger files.
  • Serverless v2 (32 - 128 ACUs) DB instance type configuration performs much better than (0.5 - 128 ACUs) due to increased capacity and its performance closer to 8xlarge. But to cut costs it's better to use (0.5 - 128 ACUs) for DB reader instance role. 
  • Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.
  • A set of tests were carried out in accordance with the conditions described in
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-578
    . Comparing tests with database instance type db.r6g.xlarge where RDS CPU utilization was stable but high with max rate of 96% against the database instance type Serverless v2 where RDS CPU utilization didn't exceed 26% in the highest load, yet the test duration decreased for 25k DI job only. Time decreasing happened because of Aurora Capacity Units (ACU) grew during first 10k data import and didn't scale down instantly. So ACUs stayed on the same level for some time and then scaled down to default level without load. 
  • In addition running tests for CICO
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-593
    I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.
  • To capture additional data from performance insights during DI with 50K file
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-602
    three DI operations for different DB instance types were carried out. All snapshots are located in Average Active Sessions table

    Table of Contents
    Overview

    This page is created to investigate Aurora serverless performance by comparing DB xlarge, 8xlarge and Aurora serverless instance types under load running Data Import (DI) with Check-in Check-out (CICO) running as background. 

    Summary

    • The environment can handle the load with all compared DB instance types. 
    • No significant changes were observed comparing response times for CICO between two instance types db.r6g.xlarge and serverless. 
    • Aurora serverless performs even better for bigger files.


    • Serverless v2 (32 - 128 ACUs) DB instance type configuration performs much better than (0.5 - 128 ACUs) due to increased capacity and its performance closer to 8xlarge. But to cut costs it's better to use (0.5 - 128 ACUs) for DB reader instance role. 
    • Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.
    • To check hypothesis that Aurora serverless configuration changes for Deployments and tasks parameter from 2 to 4 in mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage can lead to test duration decreasing additional DI without CICO were carried out. After check no dependencies are found for that amount of tests. It's considered that 5 or more DI operations needed to get relevant data.

    Results

    The table includes test results from running on different database instance types Here we observe that RDS CPU utilization for db.r6g.xlarge has maximum values and test duration grows proportional to file size.

    But after database was switched to Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.

    DI CICO Results

    Create
    Job profile: Default - Create instance and SRS MARC Bib

    8xlarge

    xlarge

    db.r6g.xlarge

    serverless

    Serverless v2 (0.5 - 128 ACUs)

    serverless

    Serverless v2 (32 - 128 ACUs)



    UsersFile - Records

    Duration (CICO)

    RDS max CPU utilizationDuration

    RDS max CPU utilization

    Duration

    RDS max CPU utilization

    Duration

    ACUsRDS max CPU utilizationDuration
    1DI
    10k

    37

    27

    00:05:15

    00:03:21

    9600:09:591700:10:07
    1600:07:17



    25k

    45

    30

    00:10:04

    00:08:08

    9600:18:192400:13:43
    2200:11:44



    50k
    3000:15:549300:37:052500:22:57
    2400:20:01
    2CICO + DI2010k90 min3900:04:329400:08:081900:09:12





    25k
    4700:09:019600:19:212600:14:30


    3CICO DI Create2010k90 min

    9400:09:561400:13:2219




    25k


    9400:21:062400:23:4925


    CICO DI Update2010k90 min

    7000:12:311200:17:4412




    25k


    7000:29:121200:31:3513


    CPU Utilization DI and DI+CICO


    8xlargexlargeserverless
    RDS

    CPU starts with spikes at the beginning of the tests and comes to normal after finish.


    Test date: 2023-05-25

    For xlarge database instance type CPU was maximum but it didn't affect DI any way. So it ran successfully 

    Test date: 2023-05-29

    For serverless CPU was stable and was not higher than 25%


    Test date: 2023-05-30

    Service

    Data imports during CICO. The services worked stable and returned to there normal state after tests

    CICO background process didn't affect DI and it worked as expected


    Stable work of services


    CICO Results

    Additional set of tests in accordance with

    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-593


    CICO Results

    Running tests for CICO

    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-593
    I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.

    Testing results for CICO

    Test date: 2023-06-02

    LG: us-west-2a

    db.r6g.xlarge


    db.r6g.8xlarge


    Serverless v2 (0.5 - 128 ACUs)



    Serverless v2 (32 - 128 ACUs)





    Users

    Duration (CICO)

    RDS max CPU utilizationDB connections

    RDS max CPU utilization

    DB connections

    RDS max CPU utilization

    ACUs

    DB connectionsRDS max CPU utilizationACUsDB connections
    1CICO830 min1646023642.5

    7.5

    3801.532380


    2030 min214302.53784.76.2396232380

    CICO


    db.r6g.xlarge

    db.r6g.8xlargeServerless v2 (0.5 - 128 ACUs)Serverless v2 (32 - 128 ACUs)
    Response Times Over Time

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    Throughput

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    RDS CPU utilization

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    Service CPU utilization

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    8 users

    20 users

    Summary table for CICO



    8 users
    20 users

    Requests% KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    db.r6g.xlargeCheck-In Controller02.8783.1142.7852.16
    02.8893.1162.7842.118
    Check-Out Controller9.1734.1034.5263.9483.212
    13.7864.0614.4223.8623.079
    db.r6g.8xlargeCheck-In Controller02.9463.2032.8492.17
    02.9143.1212.8052.107
    Check-Out Controller10.4194.1784.5653.9733.239
    13.6834.0754.4343.8753.112
    Serverless v2 (0.5 - 128 ACUs)Check-In Controller03.0883.3722.992.361
    02.9713.2142.862.24
    Check-Out Controller9.2554.4654.8624.2683.453
    13.0994.2364.6964.0393.291
    Serverless v2 (32 - 128 ACUs)Check-In Controller02.9723.2382.862.212
    02.9333.1492.8252.135
    Check-Out Controller10.5454.1914.6523.9983.274
    13.4774.1064.5253.9153.174



    Compare table for response times during 10k and 25k Data Import

    Response times getting better for bigger files during DI. Delta shows difference in %.


    10k DI25k DI

    db.r6g.xlarge
    Serverlessdelta, 75%delta, 95%db.r6g.xlarge
    Serverlessdelta, 75%delta, 95%
    Requests75th pct95th pctAverage
    75th pct95th pctAverage

    75th pct95th pctAverage
    75th pct95th pctAverage

    Check-In Controller3.2183.713.138
    3.3473.8673.118-4.01-4.233.2493.6653.076
    3.1343.3982.993.547.29
    Check-Out Controller4.9896.3614.834
    5.0065.9864.602-0.345.905.2466.2984.6664.7195.194.33310.0517.59
    Average Active Sessions for DI with 50k file

    4.7195.194.33310.0517.59


    Average Active Sessions for DI with 50k file

    To capture additional data from performance insights during DI with 50K file

    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-602
    three DI operations for different DB instance types were carried out.

    Serverless v2 (0.5 - 128 ACUs)db.r6g.8xlargedb.r6g.xlarge


    Example of growing ACUs for data import 

    Aurora Capacity Units

    serverless

    Test date: 2023-05-31

    ACUs grow in accordance with load and scale down without it gradually


    Response times for all DB configurations

    Error rate correlates with DI file size - it grows with bigger files. The lowest error rate was with Serverlessduring 25 DI. All errors are in Check-Out Controller for POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 

    db.r6g.8xlarge

    All
    Before 10K DI
    During 10K DI
    During 25K DI
    After 25K DI
    Requests75th pct95th pctAverage
    75th pct95th pctAverage
    75th pct95th pctAverage
    75th pct95th pctAverage
    75th pct95th pctAverage
    Check-In Controller2.9013.1032.792
    2.8663.1282.772
    2.9363.2322.827
    2.9333.1382.815
    2.8933.0642.764
    Check-Out Controller4.2554.7673.956
    4.2124.64.017
    4.3334.7284.088
    4.3524.7874.065
    4.2594.7313.902


    db.r6g.xlarge

    All
    Before 10K DI
    During 10K DI
    During 25K DI
    After 25K DI
    Requests% KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    Check-In Controller03.0533.4722.9422.506
    02.9043.22.8372.199
    03.2183.713.1382.726
    03.2493.673.0762.672
    02.9523.172.8562.242
    Check-Out Controller43.3794.6565.8244.2844.343
    9.1884.3224.94.2053.474
    16.0614.9896.364.8344.914
    36.6915.2466.34.6664.841
    67.3694.2714.833.9353.427


    Serverless

    Before 10K DI
    During 10K DI
    During 25K DI
    After 25K DI
    Requests% KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    % KO75th pct95th pctAverageLatency
    Check-In Controller02.9923.3152.8882.33
    03.3473.8673.1182.854
    03.1343.3982.992.45
    02.9613.1642.852.237
    Check-Out Controller13.7534.3824.9234.1763.481
    15.4595.0065.9864.6024.506
    27.4534.7195.194.3333.786
    61.164.3514.8923.9843.461

    * Due to high error rate it was decided to retest CICO DI with new job profiles for Create and Update (PTF - Create 2, PTF - Updates Success - 1).

    CICO DI Create + Update


    Serverless

    db.r6g.xlarge    
    Response Times Over Time

    Create

    Update

    Create

    Update

    RDS CPU utilization

    Service CPU utilization

    ACUs


    CICO response times

    For Aurora serverless it was observed response time growth instantly after DI start with smooth decreasing while executing (PTF - Create 2 job profile). 

    For xlarge DB instance type CPU utilization during CICO stayed stable on level of 15% and after DI with 10k file rapidly go to 93% and stay on this level during all process of DI. 

    Serverless v2 (0.5 - 128 ACUs)

     Before 10k
    During 10k
    During 25k

    Requests75th pct95th pctAverageLatency_avg
    75th pct95th pctAverageLatency_avg
    75th pct95th pctAverageLatency_avg
    CreateCheck-In Controller2.9283.1712.8551.851
    3.3913.9993.2482.242
    3.1563.4273.062.07
    Check-Out Controller4.1985.0124.1062.788
    4.825.6724.6423.311
    4.534.934.4073.085
    UpdateCheck-In Controller2.933.092.8071.823
    2.9663.1522.8821.883
    3.0483.2562.9511.948
    Check-Out Controller4.1764.974.1522.841
    4.234.464.1342.823
    4.425.0124.3272.997

    db.r6g.xlarge    















    CreateCheck-In Controller2.7642.8672.7861.788
    3.2043.4613.0772.08
    3.3183.6063.1762.178
    Check-Out Controller4.024.1554.0452.74
    4.6284.9764.4663.148
    4.8615.1814.6723.341
    UpdateCheck-In Controller2.8163.0782.741.757
    2.8252.9282.8371.848
    2.8532.9522.8681.873
    Check-Out Controller4.064.2523.9432.632
    4.0774.2024.0972.78
    4.1264.2434.1542.839


    Links to Grafana

    Test date: 2023-05-25 - 2023-05-31

    Baseline xlarge

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685351205171&to=1685356817553

    Baseline 8xlarge

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685025858811&to=1685033029740

    Aurora Serverless

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685436750832&to=1685442470092


    Test date: 2023-06-02 - 2023-06-06

    db.r6g.xlarge

    8 users: 

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685692747425&to=1685694623603

    20 users:

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685695312772&to=1685697366883

    db.r6g.8xlarge

    8 users: 

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685700612764&to=1685702445076

    20 users 

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685702803775&to=1685704908814

    Serverless v2 (0.5 - 128 ACUs)

    8 users:

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686043433681&to=1686045340051

    20 users:

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686045911070&to=1686048158943

    Serverless v2 (32 - 128 ACUs)

    8 users:

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685710370012&to=1685712636325

    20 users:

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685713535200&to=1685715506600


    Test date: 2023-06-13

    Serverless v2 (0.5 - 128 ACUs) CICO DI Create + Update

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&from=1686673259569&to=1686675910907&var-percentile=95&var-test_type=baseline&var-test=oasl_fixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

    db.r6g.xlarge 

    http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&from=1686746379062&to=16867486445671686758375536&var-percentile=95&var-test_type=baseline&var-test=oasl_fixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

    Configuration

    DI

    Version of modules:
    Source Record Manager Module (mod-source-record-manager-3.6.2)
    Source Record Storage Module (mod-source-record-storage-5.6.5)
    Inventory Module (mod-inventory-20.0.4)
    Inventory Storage Module (mod-inventory-storage-26.0.0)
    Inventory Update Module (mod-inventory-update-3.0.1)
    Data Import Module (mod-data-import-2.7.1)
    quickMARC (mod-quick-marc-3.0.0)


    CICO

    Version of modules:

    Okapi (okapi-5.0.1)

    users (mod-users-19.1.1)

    Remote storage API module (mod-remote-storage-2.0.2)

    Pubsub (mod-pubsub-2.9.1)

    Patron Blocks Module (mod-patron-blocks-1.8.0)

    Inventory Storage Module (mod-inventory-storage-26.0.0)

    Inventory Module (mod-inventory-20.0.4)

    feesfines (mod-feesfines-18.2.1)

    Configuration (mod-configuration-5.9.1)

    Circulation Storage Module (mod-circulation-storage-16.0.0)

    Circulation Module (mod-circulation-23.5.4)

    authtoken (mod-authtoken-2.13.0)


    Environment

    :

    • UI endpoint: https://aurora-serverless-test.int.aws.folio.org/
    • Okapi endpoint: https://okapi-aurora-serverless-test.int.aws.folio.org/
    • Environment is configured to use shared MSK and ES
    • Created in INT account us-west-2 region, cluster name oasl, created with snapshot of Cornell Test environment.

      Modules versions: Orchid-GA.3
      Task count: HA – okapi x3, mod-data-import, mod-data-export, mod-quick-marc, mod-data-export-spring x1, all other modules x2
      OpenSearch: fse - shared domain (6 r6g.large.search datanodes)
      MSK: dedicated cluster - total 4 brokers (kafka.m5.large)
      RDS Configuration 1: db.r6g.8xlarge instance, Aurora PostgreSQL 13.9
      RDS Configuration 2: db.r6g.xlarge instance, Aurora PostgreSQL 13.9 
      RDS Configuration 3: Aurora Serverless, min ACU: 0.5, max ACU: 128 
      RDS Configuration 4: Aurora Serverless, min ACU: 32, max ACU: 128