PTF - DI testing for Cornell (mod-srs, mod-srm connections)
Overview
- In this workflow, we are checking the performance of Data Import for Cornell. This testing is done to mimic Cornell load with increased DB connections in mod-srs and mod-srm. - PERF-555Getting issue details... STATUS
These tests were run in PTF env in cptf2 cluster
Following changes were made to env based on Cornell's load requirement:
1. Give up to 500 connections to mod-srm, mod-srs.
2. Make DB r6g.xlarge, 2xlarge, 4xlarge, 8xlarge Serverless v2 (0.5 - 128 ACUs), Serverless v2 (32 - 128 ACUs), Serverless v2 (16 - 96 ACUs), Serverless v2 (8 - 96 ACUs).
Summary
Data Import with the profile "EBSCO ebooks new and updated" causes a very high load on the database.
Increasing the number of connections for mod-srs & mod-srm to 500 (even to 100) severely increases database load and resource utilization.
One of the reasons is the huge volume of the database: for example, table mod_source_record_storage.marc_indexers has 1 028 022 472 records in the database.
Data import with the profile "EBSCO ebooks new and updated" could run successfully with database sizes: db.r6g.2xlarge, db.r6g.4xlarge, db.r6g.8xlarge. With increasing database size we can observe decreasing data import job duration.
Increasing the number of connections for mod-srs & mod-srm to 500 (even to 100) causes database memory usage errors.
Test Runs
Aurora PostgreSQL Changes | DI duration with 30 mod-srs & mod-srm connections | Status | DI duration with 500 mod-srs & mod-srm connections | ACU | Average ACUs per test | Mem (GiB) | Price per hour $ | Baseline price per month $ | Additional expenses |
Serverless v2 (0.5 - 128 ACUs) | 1 hour 3 min | Completed with errors | stopped due to DB error | 0.5 | 84 | 1 | 0.09(per 0.5 ACU) | 64.80 | +Price per additionally used ACUs |
Serverless v2 (32 - 128 ACUs) | 34 min | Completed | 20 min hanging 0% completed DB error | 32 | 99 | 64 | 0.18(per 1 ACU) | 4,147.20 | +Price per additionally used ACUs |
db.r6g.8xlarge | 18 min | Completed | stopped due to DB error | - | - | 256 | 3.597 | 2,589.84 | No additional expenses |
db.r6g.4xlarge | 29 min | Completed | stopped due to DB error | - | - | 128 | 1.798 | 1,294.56 | No additional expenses |
db.r6g.2xlarge | 50 min | Completed | stopped due to DB error | - | - | 64 | 0.889 | 640.00 | No additional expenses |
db.r6g.xlarge | 2 hours -20% done. | Could be completed with errors due to DB error | stopped due to DB error | - | - | 32 | 0.45 | 324.00 | No additional expenses |
Serverless v2 (16 - 96 ACUs) | 45 min | Completed | - | 16 | 63 | 32 | 0.18(per 1 ACU) | 2,073.60 | +Price per additionally used ACUs |
Serverless v2 (8 - 96 ACUs) | 36 min (25 min when started right after the first one) | Completed | - | 8 | 83 | 16 | 0.18(per 1 ACU) | 1,036.80 | +Price per additionally used ACUs |
* Note that the average ACU utilization for the database is approximately 4 ACUs without any activities in the environment. And each job, test, or any activity will cause additional ACUs utilization.
Explanation of the table above:
$0.18 per ACU Hour is the approximate average price for Serverless V2.
The price depends on the AWS region: for us-east N.Virginia - $0.12, for South America (Sao Paulo) - $0.25
- The scaling rate for an Aurora Serverless v2 DB instance depends on its current capacity. The higher the current capacity (minimal ACU), the faster it can scale up. If you need the DB instance to quickly scale up to a very high capacity, consider setting the minimum capacity to a value where the scaling rate meets your requirement. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.setting-capacity.html
When DB is already scaled up DI duration decreases( for example from 36 to 25 min when started right after the first one for Serverless v2 (8 - 96 ACUs)
- Although it is highly benefitted to consider the permissible minimum ACU (0.5) (because the baseline
price per month is the smallest) as the minimum capacity, it will have challenges in terms of scaling the capacity. More info https://dev.to/aws-builders/a-guide-on-selecting-amazon-aurora-serverless-and-provisioned-database-cluster-1e1h
For easier calculation of database price use https://calculator.aws/#/addService/AuroraPostgreSQL. Need to place average ACU utilization per month to get more accurate results.
Observations
To run parallel import jobs is impossible. All jobs will be run one by one.
RDS Resource utilization
Serverless v2 (8,16 - 96 ACUs)
On these graphs, we can observe gaps that correspond to DB issues with increased connections for mod-srs and mod-srm to 500.
Appendix
Infrastructure
Records count :
- mod_source_record_storage.marc_indexers = 1028022472
PTF -environment cptf2
- 6 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance one writer only
- MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- 2 partitions per DI topic