PTF - Performance Instance Resources Optimization - QCP1
- 1 Overview
- 2 Summary
- 3 Test Runs
- 4 Test Results
- 5 Comparison
- 6 Test №1
- 7 Test №2
- 8 Test №3-4
- 9 Test №5-6-7
- 10 Test №8-9
- 10.1 Instance CPU Utilization
- 10.2 Service CPU Utilization
- 10.3 Service Memory Utilization
- 10.4 Kafka metrics
- 10.5 OpenSearch metrics
- 10.6 DB CPU Utilization
- 10.7 DB Connections
- 10.8 DB load
- 10.9 Top SQL-queries
- 11 Test №10
- 11.1 Instance CPU Utilization
- 11.2 Service CPU Utilization
- 11.3 Service Memory Utilization
- 11.4 Kafka metrics
- 11.5 OpenSearch metrics
- 11.6 DB CPU Utilization
- 11.7 DB Connections
- 11.8 DB load
- 11.9 Top SQL-queries
- 12 Test №11
- 12.1 Instance CPU Utilization
- 12.2 Service CPU Utilization
- 12.3 Service Memory Utilization
- 12.4 Kafka metrics
- 12.5 OpenSearch metrics
- 12.6 DB CPU Utilization
- 12.7 DB Connections
- 12.8 DB load
- 12.9 Top SQL-queries
- 13 Test №12
- 13.1 Instance CPU Utilization
- 13.2 Service CPU Utilization
- 13.3 Service Memory Utilization
- 13.4 Kafka metrics
- 13.5 OpenSearch metrics
- 13.6 DB CPU Utilization
- 13.7 DB Connections
- 13.8 DB load
- 13.9 Top SQL-queries
- 14 Appendix
- 14.1 Infrastructure
- 15 Methodology/Approach
Overview
The primary objective of testing was to evaluate the performance of the Baseline MCPT Environment configuration while attempting to optimize costs by adjusting instance types and reducing the number of instances. The tests were designed to compare the performance outcomes across different configurations, including variations in instance types and counts within multiple Auto Scaling Groups (ASGs). By systematically modifying these variables, the goal was to maintain or improve the performance observed in the baseline configuration while achieving cost efficiency.
https://folio-org.atlassian.net/browse/PERF-962
Summary
Through a series of experiments involving different placement strategies, instance types, and total instance counts, we found that the performance remained consistent when using these configurations:
three c7g.largeinstances dedicated to theokapiservice alongside fiver7g.2xlargeinstances for all other services, with the CPU parameter set to 2 for all services.five
r7g.2xlargeinstances for all services, with the CPU parameter set to 2 for all services.
Optimized environment configurations offers a 20-40% cost reduction compared to the existing setup, making it a more economical option without compromising on performance.
Configurations with three c7g.large instances for the okapi service and five r7g.2xlarge instances for all other services show the best performance across all experiments.
In fact, some workflows show better performance with this new setup than correct infrastructures.
The CPU utilization on EC2 level better now - around 30-60%, previously it was under 20%.
AWS Configuration Costs
Cluster | Instance Type | Cost per Month | Number of Instances | Total Cost per Cluster |
|---|---|---|---|---|
QCP1 | m6g.2xlarge | $221.76 | 10 | $2,217.60 |
MCPT | m6g.2xlarge | $221.76 | 14 | $3,104.64 |
Optimized Infrastructure | c7g.large | $52.20 | 3 | $1,698.84 |
r7g.2xlarge | $308.45 | 5 | ||
Optimized Infrastructure | r7g.2xlarge | $308.45 | 5 | $1,542.25 |
Cost Comparison (Before vs After)
Cluster | Previous Total Cost | New Total Cost | Percentage Saving |
|---|---|---|---|
QCP1 | $2,217.60 | $1,698.84 | 23.39% |
MCPT | $3,104.64 | $1,698.84 | 45.28% |
Test Runs
Test # | Description | Status |
|---|---|---|
Test 1 | Instance type: m6g.2xlarge. Instances count: 10. | Completed |
Test 2 | Instance type: m6g.2xlarge. Instances count: 10 (Repeat Test 1). | Completed |
Test 3 | Used two autoscaling groups, 1st with 3 Instance Type: c7g.large for okapi service and 5 Instance Type: m6g.2xlarge for others services. | Completed |
Test 4 | Used two autoscaling groups, 1st with 3 Instance Type: c7g.large for okapi service and 5 Instance Type: m6g.2xlarge for others services (Repeat Test 3). | Completed |
Test 5 | CPU=2 was set for all modules, used two autoscaling groups, 1st with 3 Instance Type: c7g.large, 3 of them for okapi service and 5 Instance Type: r7g.xlarge for others modules. | Completed |
Test 6 | CPU=2 was set for all modules, used two autoscaling groups, 1st with 3 Instance Type: c7g.large, 3 of them for okapi service and 5 Instance Type: r7g.xlarge for others modules (Repeat Test 5). | Completed |
Test 7 | CPU=2 was set for all modules except CPU=2048 for mod-search, used two autoscaling groups, 1st with 3 Instance Type: c7g.large, 3 of them for okapi service and 5 Instance Type: r7g.xlarge for others modules. | Completed |
Test 8 | CPU=2 was set for all modules, used ONE autoscaling group with 5 Instance Type: c7g.large for all services. |
|
Test 9 | CPU=2 was set for all modules, used ONE autoscaling group with 5 Instance Type: c7g.large for all services (Repeat Test 8). |
|
Test 10 | CPU=2 was set for all modules except CPU=2048 for mod-search, used ONE autoscaling group with 5 Instance Type: c7g.large for all services. |
|
Test 11 | CPU=2 was set for all modules except CPU=2048 for mod-search, used ONE autoscaling group with 5 Instance Type: c7g.large for all services (Repeat Test 10). |
|
Test 12 | CPU=2 was set for all modules except CPU=2048 for mod-search, used ONE autoscaling group with 5 Instance Type: c7g.large for all services (Repeat Test 11). |
|
Test Results
This table contains durations for all Workflows.
Workflows | Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6 | Test 7 | Test 8 | Test 9 | Test 10 | Test 11 | Test 12 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors | Average response time | Errors |
DATA IMPORT | 0:52:03 |
| 0:44:55 |
| 0:46:07 |
| 0:47:09 |
| 0:51:41 |
| 0:58:35 |
| 1:00:03 |
| 0:43:53 |
| 0:47:06 |
| 0:55:30 |
| 0:45:46 |
| 0:44:09 |
|
DATA EXPORT | 0:58:11 |
| 0:44:43 |
| 0:47:41 |
| 0:50:32 |
| 0:38:59 |
| 0:45:53 |
| 0:48:26 | not finished for main | 0:45:41 |
| 0:48:49 |
| 0:56:49 |
| 0:42:16 |
| 0:44:26 |
|
CICO_TC_Check-In Controller | 1163 | 0% | 948 | 0% | 932 | 0% | 958 | 0% | 849 | 0% | 895 | 0% | 940 | 0% | 912 | 0% | 993 | 0% | 1176 | 0% | 892 | 0% | 967 | 0% |
CICO_TC_Check-Out Controller | 1697 | 0% | 1481 | 0% | 1408 | 0% | 1428 | 0% | 1318 | 0% | 1318 | 0% | 1367 | 0% | 1345 | 0% | 1445 | 0% | 1675 | 0% | 1371 | 0% | 1467 | 0% |
DE_Exporting MARC Bib records workflow | 2528 | 0% | 3818 | 0% | 3675 | 0% | 2830 | 0% | 1918 | 0% | 2223 | 0% | 1865 | 0% | 3363 | 0% | 2420 | 0% | 1872 | 0% | 3398 | 0% | 5033 | 0% |
ILR_TC: Create ILR | 1023 | 0% | 874 | |||||||||||||||||||||