[Nolana] Check-IN + title-level requests retest
Overview
Test goal is to assess performance of circulation check-in functionality for items with 10 TLR (title-level requests) each. Difference from the previous testing is added indexes to mod_circulation_storage.request and mod_circulation_storage.actual_cost_record in scope of - CIRCSTORE-402Getting issue details... STATUS .
Previous test report: [Nolana] Check-IN + title-level requests
Ticket: - PERF-568Getting issue details... STATUS
Summary
- Load tests showed that there is still significant degradation in performance of check-in for items with 10 TLRs each and without it. Also, response time increased after indexes were added. - see Response Time Comparison.
- Resource monitoring showed that:
- CPU consumption by several services increased significantly - see Service CPU Utilization.
RDS CPU Utilization increased from 15% to 72% - see RDS CPU Utilization
Database load increased two times - see Database Load
- Query plan analysis for top SQL-queries showed that indexes were not used in queries processing.
Recommendations & Jiras
As added indexes didn't change query plan of most CPU-consuming queries, - CIRCSTORE-402Getting issue details... STATUS should be reviewed. Tests should be repeated with fixes.
Test Runs
Test # | Test Conditions | Duration | Load generator size (recommended) | Load generator Memory (GiB) (recommended) | Notes |
---|---|---|---|---|---|
1. | Baseline, Check-in with 1, 8, 25 users | 30 min | t3.medium | 3 | Without TLR |
2. | Verification, Check-in with 1, 8, 25 users | With 10 TLR per item |
Results
Response Times
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
1 user
8 users
25 users
Response time comparison
Items without TLR and items with TLR (both after indexes were added)
User quantity | Check-in response time 95prc, sec | Degradation, sec | Degradation, % | |
---|---|---|---|---|
Baseline (items without TLR) | Verification (items with 10 TLR each) | |||
1 user | 1.686 | 2.219 | 0.533 | 31% |
8 users | 0.498 | 1.221 | 0.725 | 145% |
25 users | 0.588 | 2.333 | 1.745 | 296% |
25 users (rerun, with analyze operation before the test) | 0.622 | 2.222 | 1.6 | 257% |
Tests without indexes and with indexes added (both for items with TLR)
User quantity | Check-in response time 95prc, sec | Degradation, sec | Degradation, % | |
---|---|---|---|---|
Baseline (before fix) | Verification (after fix) | |||
1 user | 0.953 | 2.219 | 1.266 | 132% |
8 users | 0.782 | 1.221 | 0.439 | 56% |
25 users | 2.001 | 2.333 | 0.332 | 16% |
Service CPU Utilization
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
During verification tests CPU utilization for mod-users increased significantly. For 25 users test increase was from 22% to 51%.
1 user
8 users
25 users
Memory Utilization
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
1 user
8 users
25 users
DB CPU Utilization
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
During verification tests RDS CPU utilization increased significantly. For 25 users test increase was from 15% to 72%.
1 user
8 users
25 users
DB Connections
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
1 user
8 users
25 users
DB load
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
1 user
8 users
25 users
Top-SQL
Baseline (items without TLR)
25 users
Verification (items with 10 TLRs)
During verification tests two SQL queries moved to the beginning of Top SQL list:
SELECT [tenant]_mod_circulation_storage.count_estimate(?)
SELECT jsonb,id FROM [tenant]_mod_circulation_storage.request WHERE ((lower(f_unaccent(request.jsonb->>?)) LIKE lower(f_unaccent(?))) AND ((((CASE WHEN length(lower(?)) <= ? THEN left(lower(request.jsonb->>?),?) LIKE lower(?) ELSE left(lower(request.jsonb->>?),?) LIKE left(lower(?),?) AND lower(request.jsonb->>?) LIKE lower(?) END) OR (CASE WHEN length(lower(?)) <= ? THEN left(lower(request.jsonb->>?),?) LIKE lower(?) ELSE left(lower(request.jsonb->>?),?) LIKE left(lower(?),?) AND lower(reques
25 users
Appendix
Infrastructure
PTF -environment ncp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 2 instances of db.r6.xlarge database instances: Writer & reader instances
- MSK ptf-kakfa-3 [ kafka configurations]
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Environment didn't change since previous testing, but two indexes were added: request_instanceid_idx on [tenant]_mod_circulation_storage.request and ctual_cost_record_expirationdate_idx on [tenant]_mod_circulation_storage.actual_cost_record.
Modules memory and CPU parameters:
Modules | Version | Task Definition | Running Tasks | CPU | Memory (Soft/Hard limits) | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|
okapi | 4.14.7 | 1 | 3 | 1024 | 1440/1684 | 512 | 922 |
mod-feesfines | 18.1.1 | 3 | 2 | 128 | 896/1024 | 128 | 768 |
mod-patron-blocks | 1.7.1 | 4 | 2 | 1024 | 896/1024 | 128 | 768 |
mod-pubsub | 2.7.0 | 4 | 2 | 1024 | 1440/1536 | 512 | 922 |
mod-authtoken | 2.12.0 | 3 | 2 | 512 | 1152/1440 | 128 | 922 |
mod-circulation-storage | 15.0.2 | 3 | 2 | 1024 | 1440/1536 | 512 | 896 |
mod-circulation | 23.3.2 | 3 | 2 | 1024 | 896/1024 | 128 | 768 |
mod-configuration | 5.9.0 | 3 | 2 | 128 | 896/1024 | 128 | 768 |
mod-inventory | 19.0.1 | 10 | 2 | 1024 | 2592/2880 | 512 | 1814 |
mod-inventory-storage | 25.0.3 | 3 | 2 | 1024 | 1952/2208 | 512 | 1440 |
mod-users | 19.0.0 | 4 | 2 | 128 | 896/1024 | 128 | 768 |
mod-remote-storage | 1.7.1 | 3 | 2 | 128 | 1692/1872 | 512 | 1178 |
Methodology/Approach
- Run necessary commands to return the database to the initial state. Do this before each test run. Wait several minutes before the test start.
- Conduct check-out for the items with JMeter script Create_TLR.jmx (disable "Create_TLR" step).
- Conduct baseline - run check-in load tests with different number of users.
- Conduct verification - repeat tests with the same approach but before each test also generate 10 TLR for each item by running JMeter script (Create_TLR.jmx) - enable both Check-in and Create_TLR steps. Important: if indexes were added, "ANALYZE table name" should be conducted to make index work.
- Compare test results.
Note - make sure to use the same list of items for Create_TLR.jmx script and Check-in script. Also, items should be selected for those instances which have 1 item per instance.
Grafana dashboard
Baseline (items without TLR)
1 user
8 users
25 users
Verification (items with 10 TLRs)
1 user
8 users
25 users