PTF - DI testing for Cornell (mod-srs, mod-srm connections)

PTF - DI testing for Cornell (mod-srs, mod-srm connections)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Overview

  1. In this workflow, we are checking the performance of Data Import for Cornell. This testing is done to mimic Cornell load with increased DB connections in mod-srs and mod-srm. https://folio-org.atlassian.net/browse/PERF-555

 

These tests were run in PTF env in cptf2 cluster

Following changes were made to env based on Cornell's load requirement:

1. Give up to 500 connections to mod-srm, mod-srs.

2. Make DB r6g.xlarge, 2xlarge, 4xlarge, 8xlarge Serverless v2 (0.5 - 128 ACUs), Serverless v2 (32 - 128 ACUs), Serverless v2 (16 - 96 ACUs), Serverless v2 (8 - 96 ACUs).

Summary


Data Import with the profile "EBSCO ebooks new and updated" causes a very high load on the database.

Increasing the number of connections for mod-srs & mod-srm to 500 (even to 100) severely increases database load and resource utilization.

One of the reasons is the huge volume of the database: for example, table mod_source_record_storage.marc_indexers has 1 028 022 472 records in the database.

Data import with the profile "EBSCO ebooks new and updated" could run successfully with database sizes: db.r6g.2xlarge, db.r6g.4xlarge, db.r6g.8xlarge. With increasing database size we can observe decreasing data import job duration.

Increasing the number of connections for mod-srs & mod-srm to 500 (even to 100) causes database memory usage errors.

Test Runs

Aurora PostgreSQL Changes

DI duration with 30 mod-srs & mod-srm connections

Status

DI duration with 500 mod-srs & mod-srm connections

ACU

Average ACUs per test

Mem (GiB)

Price per hour $

Baseline
price per month $

Additional expenses

Serverless v2 (0.5 - 128 ACUs)

1 hour 3 min

Completed with errors

stopped due to DB error

0.5

84

1

0.09(per 0.5 ACU)

64.80

+Price per additionally used ACUs

Serverless v2 (32 - 128 ACUs)

34 min

Completed

20 min hanging 0% completed DB error

32

99

64

0.18(per 1 ACU)

4,147.20

+Price per additionally used ACUs

db.r6g.8xlarge

18 min

Completed

stopped due to DB error

-

-

256

3.597

2,589.84

No additional expenses

db.r6g.4xlarge

29 min

Completed

stopped due to DB error

-

-

128

1.798

1,294.56

No additional expenses

db.r6g.2xlarge

50 min

Completed

stopped due to DB error

-

-

64

0.889

640.00

No additional expenses

db.r6g.xlarge

2 hours -20% done.

Could be completed with errors due to DB error

stopped due to DB error

-

-

32

0.45

324.00

No additional expenses

Serverless v2 (16 - 96 ACUs)

45 min

Completed

-

16

63

32

0.18(per 1 ACU)

2,073.60

+Price per additionally used ACUs

Serverless v2 (8 - 96 ACUs)

36 min (25 min when started right after the first one)

Completed

-

8

83

16

0.18(per 1 ACU)

1,036.80

+Price per additionally used ACUs

Note that the average ACU utilization for the database is approximately 4 ACUs without any activities in the environment. And each job, test, or any activity will cause additional ACUs utilization.

Explanation of the table above:

$0.18 per ACU Hour is the approximate average price for Serverless V2.

The price depends on the AWS region: for us-east N.Virginia - $0.12, for South America (Sao Paulo) - $0.25

When DB is already scaled up DI duration decreases( for example from 36 to 25 min when started right after the first one for Serverless v2 (8 - 96 ACUs)

For easier calculation of database price use https://calculator.aws/#/addService/AuroraPostgreSQL. Need to place average ACU utilization per month to get more accurate results.

Observations

To run parallel import jobs is impossible. All jobs will be run one by one.

 RDS Resource utilization

Serverless v2 (8,16 - 96 ACUs)

 

On these graphs, we can observe gaps that correspond to DB issues with increased connections for mod-srs and mod-srm to 500.

 

Appendix

 

Infrastructure

Records count :

  • mod_source_record_storage.marc_indexers = 1028022472

PTF -environment cptf2 

  • 6 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • 1 database  instance one writer only

  • MSK ptf-kakfa-3

    • 4 m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

    • 2 partitions per DI topic

 

Module
cptf2-pvt

Task Def. Revision   

Module Version

 Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize  

 MaxMetaspaceSize

R/W split enabled

Module
cptf2-pvt

Task Def. Revision   

Module Version

 Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize  

 MaxMetaspaceSize

R/W split enabled

mod-data-import

5

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-data-import:2.7.1

1

2048

1844

256

1292

384

512

false

mod-authtoken

5

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-authtoken:2.13.0

2

1440

1152

512

922

88

128

false

mod-inventory-update

5

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-inventory-update:3.0.1

2

1024

896

128

768

88

128

false

mod-configuration

5

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-configuration:5.9.1

2

1024

896

128

768

88

128

false

mod-inventory-storage

5

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-inventory-storage:26.0.0

2

4096

3690

2048

3076

384

512

false

mod-source-record-storage

18-22

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-source-record-storage:5.6.10

2

5600

5000

2048

3500

384

512

false

mod-inventory

7

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-inventory:20.0.6

2

2880

2592

1024

1814

384

512

false

mod-di-converter-storage

6

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-di-converter-storage:2.0.5

2

1024

896

128

768

88

128

false

mod-source-record-manager

14-21

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-source-record-manager:3.6.4

2

5600

5000

2048

3500

384

512

false

nginx-edge

10

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/nginx-edge:2023.06.14

2

1024

896

128

0

0

0

false

mod-quick-marc

5

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-quick-marc:3.0.0

1

2288

2176

128

1664

384

512

false

nginx-okapi

10

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/nginx-okapi:2023.06.14

2

1024

896

128

0

0

0

false

okapi-b

12

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/okapi:5.0.1

3

1684

1440

1024

922

384

512

false

pub-okapi

10

579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/pub-okapi:2023.06.14

2

1024

896

128

768

0

0

false

mod-data-import-converter-storage