Table of Contents | ||
---|---|---|
|
...
- In this workflow, we are checking the performance of Data Import for Cornell. This testing is done to mimic Cornell load with increased DB connections in mod-srs and mod-srm.
Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-555
...
One of the reasons is the huge volume of the database: for example, table mod_source_record_storage.marc_indexers has 1028022472 1 028 022 472 records in the database.
Data import with the profile "EBSCO ebooks new and updated" could run successfully with database sizes: db.r6g.2xlarge, db.r6g.4xlarge, db.r6g.8xlarge. With increasing database size we can observe decreasing data import job duration.
...
On these graphs, we can observe gaps that correspond to DB issues with increased connections for mod-srs and mod-srm to 500.
Appendix
Infrastructure
Records count :
- mod_source_record_storage.marc_indexers = 1028022472
PTF -environment cptf2
- 6 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance one writer only
- MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- 2 partitions per DI topic
...