Overview
Test goal is to assess performance of CICO and data import scenarios with decreased number of partitions in Kafka topics.
Topics setup can be found here.
Ticket: - PERF-400Getting issue details... STATUS
Summary
- Load tests results comparison showed that there is no significant degradation in response times with decreased number of partitions.
- Resource consumption of server, database and Kafka instances also didn't change with decreased number of partitions.
Test Runs
Test # | Test Conditions | Duration | Load generator size (recommended) | Load generator Memory(GiB) (recommended) | Notes |
---|---|---|---|---|---|
1. | Checkin/Ckeckout with 8, 20, 25 users | 30 min | t3.medium | 3 |
|
2. | Data import with 5K, 25K, 50K, 100K Create imports | x |
Results
Check In, Check Out
Response Times (CICO)
10/50 Partitions
2 Partitions
Response time comparison CICO + DI
Transaction | Response time, 95 percentile | Degradation, s | Degradation, % | |
---|---|---|---|---|
10/50 Partitions | 2 Partitions | |||
Check-in Controller 8us | 0.555 s | 0.576 s | 0.021 s | 4% |
Check-out Controller 8us | 0.899 s | 0.897 s | -0.002 s | 0% |
Check-in Controller 20us | 0.554 s | 0.592 s | 0.038 s | 7% |
Check-out Controller 20us | 0.852 s | 0.860 s | 0.008 s | 1% |
Check-in Controller 25us | 0.553 s | 0.589 s | 0.036 s | 7% |
Check-out Controller 25us | 0.834 s | 0.894 s | 0.06 s | 7% |
Data import 5K | 2m 8 s | 2m 24 s | 16 s | 13% |
Data import 25K | 10 min 41 s | 11 min 27 s | 46 s | 7% |
Data import 50K | 21 min 11 s | 19 min 16 s | -115 s | -9% |
Data import 100K | 42 min 35 s | 40 min 24 s | -131 s | -5% |
Instance CPU Utilization (CICO)
10/50 Partitions
2 Partitions
Service CPU Utilization (CICO)
10/50 Partitions
2 Partitions
Memory Utilization (CICO)
10/50 Partitions
mod-inventory-storage memory usage increased from 57 to 65 during the test. This behaviour was also reproduced for the tests with 2 partitions.
2 Partitions
RDS CPU Utilization (CICO)
10/50 Partitions
2 Partitions
There is a 5% increase in CPU utilization for 25 users test, but this behaviour wasn't reproduced during retesting. It can be caused by external factors.
RDS DB connections (CICO)
10/50 Partitions
2 Partitions
Kafka CPU load (CICO)
10/50 Partitions
2 Partitions
Database Load (CICO)
10/50 Partitions
2 Partitions
Data Import
Instance CPU Utilization (DI)
10/50 Partitions
2 Partitions
Service CPU Utilization (DI)
10/50 Partitions
2 Partitions
Memory Utilization (DI)
10/50 Partitions
2 Partitions
RDS CPU Utilization (DI)
10/50 Partitions
2 Partitions
RDS DB connections (DI)
2 Partitions
Kafka CPU (DI)
10/50 Partitions
2 Partitions
Database Load (DI)
2 Partitions
Bulk Edits
Jobs Duration comparison
Transaction | Job duration | Degradation, s | Degradation, % | |
---|---|---|---|---|
10/50 Partitions | 2 Partitions | |||
Users 1000 records | 43 s | 44 s | 1 s | 2% |
Users 2500 records | 1 min 49 s | 1 min 45 s | - 4 s | -4% |
Items 1000 records | 3 min 8 s | 2 min 49 s | -19 s | -10% |
Items 10k records | 22 min 44 s | 19 min 13 s | -3 min 31 s | -15% |
Holdings 1000 records | 1 min 52 s | 1 min 51 s | -1 s | -0.8% |
Holdings 10k records | 11 min 14 s | 10 min 46 s | -28 s | -4% |
Holdings 10k +Items 10k + Users 2500 | 10 min 39 s 19m 10 s 1 min 42 s | 10 min 31 s 18 min 56 s 1 min 40 s | -8 s -14 s -2 s | -1% -1% -2% |
Holdings 1000 +Items 1000 + Users 1000 | 1 min 47 s 2 min 43 s 42 s | 1 min 44s 2 min 41 s 41 s | -3 s -2 s -1 s | -3% -1% -2% |
Instance CPU Utilization (Bulk Edit)
Service CPU Utilization )Bulk Edit)
Memory Utilization (Bulk Edit)
RDS CPU Utilization (Bulk Edit)
RDS DB connections (Bulk Edit)
Kafka CPU (Bulk Edit)
Database Load (Bulk Edit)
Appendix
Infrastructure
PTF -environment ncp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 2 instances of db.r6.xlarge database instances: Writer & reader instances
- MSK ptf-kakfa-3 [ kafka configurations]
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Modules memory and CPU parameters:
Modules | Version | Task Definition | Running Tasks | CPU | Memory (Soft/Hard limits) | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|
mod-data-import | 2.6.2 | 4 | 1 | 256 | 1844/2048 | 512 | 1292 |
mod-data-import-cs | 1.15.1 | 1 | 2 | 128 | 896/1024 | 128 | 768 |
mod-source-record-storage | 5.5.2 | 4 | 2 | 1024 | 1440/1536 | 512 | 908 |
mod-source-record-manager | 3.5.6 | 4 | 2 | 1024 | 3688/4096 | 512 | 2048 |
mod-inventory | 19.0.2 | 7 | 2 | 1024 | 2592/2880 | 512 | 1814 |
mod-inventory-storage | 25.0.3 | 3 | 2 | 1024 | 1952/2208 | 512 | 1440 |
mod-quick-marc | 2.5.0 | 3 | 1 | 128 | 2176/2288 | 512 | 1664 |
okapi | 4.14.7 | 1 | 3 | 1024 | 1440/1684 | 512 | 922 |
mod-feesfines | 18.1.1 | 3 | 2 | 128 | 896/1024 | 128 | 768 |
mod-patron-blocks | 1.7.1 | 4 | 2 | 1024 | 896/1024 | 128 | 768 |
mod-pubsub | 2.7.0 | 4 | 2 | 1024 | 1440/1536 | 512 | 922 |
mod-authtoken | 2.12.0 | 3 | 2 | 512 | 1152/1440 | 128 | 922 |
mod-circulation-storage | 15.0.2 | 3 | 2 | 1024 | 1440/1536 | 512 | 896 |
mod-circulation | 23.3.2 | 3 | 2 | 1024 | 896/1024 | 128 | 768 |
mod-configuration | 5.9.0 | 3 | 2 | 128 | 896/1024 | 128 | 768 |
mod-users | 19.0.0 | 4 | 2 | 128 | 896/1024 | 128 | 768 |
mod-remote-storage | 1.7.1 | 3 | 2 | 128 | 1692/1872 | 512 | 1178 |
Topics setup
Topic | Partitions number | |
---|---|---|
Baseline | Verification | |
ncp3.fs09000000.circulation.check-in | 10 | 2 |
ncp3.fs09000000.circulation.loan | 10 | 2 |
ncp3.fs09000000.circulation.request | 10 | 2 |
ncp3.fs09000000.data-export.job.command | 50 | 2 |
ncp3.fs09000000.data-export.job.update | 50 | 2 |
ncp3.fs09000000.inventory.async-migration | 50 | 2 |
ncp3.fs09000000.inventory.authority | 50 | 2 |
ncp3.fs09000000.inventory.bound-with | 50 | 2 |
ncp3.fs09000000.inventory.holdings-record | 50 | 2 |
ncp3.fs09000000.inventory.instance | 50 | 2 |
ncp3.fs09000000.inventory.instance-contribution | 50 | 2 |
ncp3.fs09000000.inventory.item | 50 | 2 |
ncp3.fs09000000.search.instance-contributor | 50 | 2 |
Methodology/Approach
- Conduct necessary commands to return the database to the initial state. Do this before each test run. Wait several minutes before the test start.
- Conduct CICO load tests with different number of users + data import.
- Change partitions number from 10/50 to 2 for all necessary topics.
- Repeat tests.
- Compare test results.
Grafana dashboard
Please note that dashboards will expire in 6 weeks since test run.