Overview
Test goal is to assess performance of CICO and data import scenarios with decreased number of partitions in Kafka topics.
Topics setup can be found here.
Ticket: - PERF-400Getting issue details... STATUS
Summary
- Load tests results comparison showed that there is no significant degradation in response times with decreased number of partitions.
- Resource consumption of server, database and Kafka instances also didn't change with decreased number of partitions.
Test Runs
Test # | Test Conditions | Duration | Load generator size (recommended) | Load generator Memory(GiB) (recommended) | Notes |
---|---|---|---|---|---|
1. | Checkin/Ckeckout with 8, 20, 25 users | 30 min | t3.medium | 3 |
|
2. | Data import with 5K, 25K, 50K, 100K Create imports | x |
Results
Response Times CICO
10/50 Partitions
2 Partitions
Response time comparison CICO + DI
Transaction | Response time, 95 percentile | Degradation, s | Degradation, % | |
---|---|---|---|---|
10/50 Partitions | 2 Partitions | |||
Check-in Controller 8us | 0.555 s | 0.576 s | 0.021 s | 4% |
Check-out Controller 8us | 0.899 s | 0.897 s | -0.002 s | 0% |
Check-in Controller 20us | 0.554 s | 0.592 s | 0.038 s | 7% |
Check-out Controller 20us | 0.852 s | 0.860 s | 0.008 s | 1% |
Check-in Controller 25us | 0.553 s | 0.589 s | 0.036 s | 7% |
Check-out Controller 25us | 0.834 s | 0.894 s | 0.06 s | 7% |
Data import 5K | 2m 8 s | 2m 24 s | 16 s | 13% |
Data import 25K | 10 min 41 s | 11 min 27 s | 46 s | 7% |
Data import 50K | 21 min 11 s | 19 min 16 s | -115 s | -9% |
Data import 100K | 42 min 35 s | 40 min 24 s | -131 s | -5% |
Instance CPU Utilization CICO
10/50 Partitions
2 Partitions
Service CPU Utilization CICO
10/50 Partitions
2 Partitions
Memory Utilization CICO
10/50 Partitions
mod-inventory-storage memory usage increased from 57 to 65 during the test. This behaviour was also reproduced for the tests with 2 partitions.
2 Partitions
RDS CPU Utilization CICO
10/50 Partitions
2 Partitions
There is a 5% increase in CPU utilization for 25 users test, but this behaviour wasn't reproduced during retesting. It can be caused by external factors.
RDS DB connections CICO
10/50 Partitions
2 Partitions
Kafka CPU load CICO
10/50 Partitions
2 Partitions
Database load CICO
10/50 Partitions
2 Partitions
Instance CPU Utilization DI
2 Partitions
Service CPU Utilization DI
2 Partitions
Memory Utilization DI
2 Partitions
RDS CPU Utilization DI
2 Partitions
RDS DB connections DI
2 Partitions
Kafka CPU load DI
2 Partitions
Database load DI
2 Partitions
Appendix
Infrastructure
PTF -environment ncp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 2 instances of db.r6.xlarge database instances: Writer & reader instances
- MSK ptf-kakfa-3 [ kafka configurations]
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Modules memory and CPU parameters:
Modules | Version | Task Definition | Running Tasks | CPU | Memory (Soft/Hard limits) | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|
mod-data-import | 2.6.2 | 4 | 1 | 256 | 1844/2048 | 512 | 1292 |
mod-data-import-cs | 1.15.1 | 1 | 2 | 128 | 896/1024 | 128 | 768 |
mod-source-record-storage | 5.5.2 | 4 | 2 | 1024 | 1440/1536 | 512 | 908 |
mod-source-record-manager | 3.5.6 | 4 | 2 | 1024 | 3688/4096 | 512 | 2048 |
mod-inventory | 19.0.2 | 7 | 2 | 1024 | 2592/2880 | 512 | 1814 |
mod-inventory-storage | 25.0.3 | 3 | 2 | 1024 | 1952/2208 | 512 | 1440 |
mod-quick-marc | 2.5.0 | 3 | 1 | 128 | 2176/2288 | 512 | 1664 |
okapi | 4.14.7 | 1 | 3 | 1024 | 1440/1684 | 512 | 922 |
mod-feesfines | 18.1.1 | 3 | 2 | 128 | 896/1024 | 128 | 768 |
mod-patron-blocks | 1.7.1 | 4 | 2 | 1024 | 896/1024 | 128 | 768 |
mod-pubsub | 2.7.0 | 4 | 2 | 1024 | 1440/1536 | 512 | 922 |
mod-authtoken | 2.12.0 | 3 | 2 | 512 | 1152/1440 | 128 | 922 |
mod-circulation-storage | 15.0.2 | 3 | 2 | 1024 | 1440/1536 | 512 | 896 |
mod-circulation | 23.3.2 | 3 | 2 | 1024 | 896/1024 | 128 | 768 |
mod-configuration | 5.9.0 | 3 | 2 | 128 | 896/1024 | 128 | 768 |
mod-users | 19.0.0 | 4 | 2 | 128 | 896/1024 | 128 | 768 |
mod-remote-storage | 1.7.1 | 3 | 2 | 128 | 1692/1872 | 512 | 1178 |
Topics setup
Topic | Partitions number | |
---|---|---|
Baseline | Verification | |
ncp3.fs09000000.circulation.check-in | 10 | 2 |
ncp3.fs09000000.circulation.loan | 10 | 2 |
ncp3.fs09000000.circulation.request | 10 | 2 |
ncp3.fs09000000.data-export.job.command | 50 | 2 |
ncp3.fs09000000.data-export.job.update | 50 | 2 |
ncp3.fs09000000.inventory.async-migration | 50 | 2 |
ncp3.fs09000000.inventory.authority | 50 | 2 |
ncp3.fs09000000.inventory.bound-with | 50 | 2 |
ncp3.fs09000000.inventory.holdings-record | 50 | 2 |
ncp3.fs09000000.inventory.instance | 50 | 2 |
ncp3.fs09000000.inventory.instance-contribution | 50 | 2 |
ncp3.fs09000000.inventory.item | 50 | 2 |
ncp3.fs09000000.search.instance-contributor | 50 | 2 |
Methodology/Approach
- Conduct necessary commands to return the database to the initial state. Do this before each test run. Wait several minutes before the test start.
- Conduct CICO load tests with different number of users + data import.
- Change partitions number from 10/50 to 2 for all necessary topics.
- Repeat tests.
- Compare test results.