Overview
This document contains the results of testing List App refreshing of 200k records on multiple tenants with R/W split enabled (Poppy release). The goal of testing is to assess the performance of mod-lists with load spread across multiple tenants.
Ticket:
- PERF-665Getting issue details... STATUS
Summary
- Tests showed the Lists App refresh of concurrent lists on 3 tenants are:
- 1.5 mins for 3 concurrent lists refresh test (1 list refresh on each tenant);
- 2.3 mins for 10 concurrent lists refresh test (3-4 lists refresh on each tenant).
- Load test for 30 lists (10 lists per tenant) failed due to DB overload (100% of refresh transactions failed). After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database.
- During the 10 lists test CPU utilization reached 200% for mod-fqm-manager and 111% for mod-lists. Also, mod-permissions' CPU utilization exceeded 100% during 30 lists test.
- Maximum DB CPU utilization reached 83% (writer instance) and 99% (reader instance) during the 30 lists test. In comparison with testing with R/W split disabled, RDS CPU utilization didn't decrease when DB R/W split was enabled.
- Memory utilization for mod-permissions increased from 48% to 76% during the tests. No memory leak is suspected for all the modules.
Test runs
Query used in lists - "Item status != Available". List refresh result is about 200K records.
Scenario | Data quantity |
---|---|
List App refresh multiple tenants | tenant 1 - 1 list tenant 2 - 1 list tenant 3 - 1 list |
tenant 1 - 3 lists tenant 2 - 3 lists tenant 3 - 4 lists | |
tenant 1 - 10 lists tenant 2 - 10 lists tenant 3 - 10 lists |
Results
Transaction | Duration, avg | Release | Tenants | Number of lists | R/W split | Other conditions |
---|---|---|---|---|---|---|
Lists App refresh | 10 min 40 sec | [Orchid] | 1 tenant | 10 | disabled | |
8.5 min | [Poppy] | 1 tenant | 10 | disabled | ||
17.7 min | [Poppy] | 1 tenant | 10 | disabled | Testing in parallel with DI and CICO | |
Lists App refresh current test results** | 1.5 min | [Poppy] | 3 tenants | 3 | enabled | |
2.3 min | [Poppy] | 3 tenants | 10 | enabled | ||
error | [Poppy] | 3 tenants | 30 | enabled | 100% of refresh transactions failed*** |
* Query used in lists - "Item status == Checked out". List refresh result is 200K records. Results are taken from previous test reports: [Poppy] List App with multiple workflows and R/W split disabled test report, [Orchid] List App test report
**Query used in lists - "Item status != Available". List refresh result is about 200K records.
***After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database.
Instance CPU Utilization
Service CPU Utilization
During the 10-lists test CPU utilization reached 200% for mod-fqm-manager and 111% for mod-lists. Also, mod-permissions CPU utilization exceeded 100% during 30 lists test.
Memory Utilization
Memory utilization for mod-permissions increased from 48% to 76% during the tests. No memory leak is suspected for all the modules.
DB CPU Utilization
Maximum DB CPU utilization reached 83% (writer instance) and 99% (reader instance) during 30 lists test.
In comparison with testing with R/W split disabled, RDS CPU utilization for the writer node decreased from 70% to 29% for 3 users test, from 95% to 68% for 10 users test. At the same time CPU load on the reader node is higher than on the writer.
Results for multiple tenants and R/W split disabled testing (for comparison):
Details can be found here: [Poppy] List App with multiple tenants and R/W split disabled
DB Connections
DB Load
Writer DB node
Reader DB node
TOP SQL
Writer DB node
Reader DB node
Long-running queries:
select id from [tenant]_mod_fqm_manager.drv_item_callnumber_location where lower(cast(item_status as varchar)) <> lower($1) parameters: $1 = 'Available'
delete from list_contents where list_id=$1 and refresh_id=$2
Appendix
Infrastructure
PTF -environment pcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK tenant
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Module pcp1-pvt Fri Oct 27 08:26:47 UTC 2023 | Task Def. Revision | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
---|---|---|---|---|---|---|---|---|---|
mod-inventory-storage:27.0.0 | 10 | 2 | 4096 | 3690 | 2048 | 3076 | 384 | 512 | false |
mod-users:19.2.0 | 19 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
nginx-okapi:2023.06.14 | 8 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
mod-circulation-storage:17.1.0 | 10 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
okapi:5.1.1 | 9 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-inventory:20.1.0 | 9 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | false |
mod-circulation:24.0.0 | 10 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
pub-okapi:2023.06.14 | 8 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | false |
mod-fqm-manager:1.0.0 | 5 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | true |
mod-lists:1.0.0 | 5 | 2 | 3000 | 2600 | 128 | 2048 | 384 | 512 | false |
Methodology
- Enable R/W split for mod-fqm-manager.
- Create 10 lists with the query "Item status != Available" on each of three tenants to be able to run a test for up to 30 concurrent lists (users).
- Prepare 200K item records for the query to return. Details can be found at the link: Steps for testing process#ListApp
- Conduct tests with JMeter script for multiple tenants.