[Poppy] List App with multiple tenants and R/W split disabled
Overview
This document contains the results of testing List App refreshing of 200k records on multiple tenants with R/W split disabled (Poppy release). The goal of testing is to assess the performance of mod-lists with load spread across multiple tenants and under different conditions (settings).
Ticket:
- PERF-709Getting issue details... STATUS
Summary
- Load tests (multiple tenants, R/W split disabled) showed that ListApp refreshing duration is
- 1.9 min for 3 concurrent lists test (1 list refresh on each tenant);
- 3.4 min for 10 concurrent lists test (3-4 lists refresh on each tenants).
- During load test for 30 lists (10 lists per tenant) some of the list refreshes failed. After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database. Details on the issue can be found here: Failed list refresh investigation.
- Comparing to previous test results, Lists App refresh duration is about 20-30% higher with R/W split feature disabled.
- During the 10 lists refresh test CPU utilization reached 138% for mod-fqm-manager and 118% for mod-lists. In comparison with testing with R/W split enabled, CPU utilization for mod-fqm-manager is 62% lower for testing with R/W split disabled.
- Memory utilization for mod-fqm-manager increased from 37% to 49%, for mod-permissions - from 33% to 66% during the tests. No memory leak is suspected for all the modules.
- Maximum DB CPU utilization reached 99% (writer instance) during 30 lists test. In comparison with multiple tenants testing, RDS CPU utilization for writer instance is 20% lower for single tenant test.
Recommendations & Jiras
- PERF-732Getting issue details... STATUS
Test runs
Query used in lists - "Item status != Available". List refresh result is about 200K records.
Scenario | Data quantity |
---|---|
List App refresh multiple tenants R/W split disabled | tenant 1 - 1 list tenant 2 - 1 list tenant 3 - 1 list |
tenant 1 - 3 lists tenant 2 - 3 lists tenant 3 - 4 lists | |
tenant 1 - 10 lists tenant 2 - 10 lists tenant 3 - 10 lists |
Results
Lists App refresh, avg | 3 lists | 10 lists | 30 lists |
---|---|---|---|
Poppy, multiple tenants, R/W split disabled | 1.9 min | 3.4 min | refresh failed for some of the lists*** |
Poppy, multiple tenants, R/W split enabled* | 1.5 min | 2.3 min | refresh failed for some of the lists*** |
Poppy, single tenant, R/W split enabled** | 3.1 min | 6.5 min | - |
*Results are taken from previous test report: [Poppy] List App with multiple tenants and R/W split enabled
**Results are taken from previous test report: [Poppy] List App with single tenant and R/W split enabled
***After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database. More details about failing refresh you can find at the link: Failed list refresh investigation
Service CPU Utilization
During the 10 lists test CPU utilization reached 138% for mod-fqm-manager and 118% for mod-lists. Also, mod-permissions CPU utilization exceeded 107% during 30 lists test.
In comparison with testing with R/W split enabled, CPU utilization for mod-fqm-manager is 62% lower for testing with R/W split disabled.
(Note that "users" are synonymous and interchangeable with "lists", e.g., 30 users = 30 lists)
Memory Utilization
Memory utilization for mod-fqm-manager increased form 37% to 49%, for mod-permissions - from 33% to 66% during the tests. No memory leak is suspected for all the modules.
DB CPU Utilization
Maximum DB CPU utilization reached 99% (writer instance) during 30 lists test.
In comparison with multiple tenants testing, RDS CPU utilization for writer instance is 20% lower for single tenant test.
DB Connections
DB Load
TOP SQL
Long-running queries:
|
|
Appendix
Infrastructure
PTF -environment pcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK tenant
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Module | Task Def. Revision | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
---|---|---|---|---|---|---|---|---|---|
mod-inventory-storage:27.0.0 | 10 | 2 | 4096 | 3690 | 2048 | 3076 | 384 | 512 | false |
mod-users:19.2.0 | 19 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
nginx-okapi:2023.06.14 | 8 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
mod-circulation-storage:17.1.0 | 10 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
okapi:5.1.1 | 9 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-inventory:20.1.0 | 9 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | false |
mod-circulation:24.0.0 | 10 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
pub-okapi:2023.06.14 | 8 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | false |
mod-fqm-manager:1.0.0 | 4 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-lists:1.0.0 | 5 | 2 | 3000 | 2600 | 128 | 2048 | 384 | 512 | false |
Methodology
- Disable R/W split for mod-fqm-manager.
- Create 10 lists with the query "Item status != Available" on each of three tenants to be able to run a test for up to 30 concurrent lists.
- Prepare 200K item records for the query to return. Details can be found at the link: Steps for testing process#ListApp
- Conduct tests with JMeter script for multiple tenants.