Overview

This document contains the results of testing List App refreshing of 200k records on multiple tenants with R/W split disabled (Poppy release). The goal of testing is to assess the performance of mod-lists with load spread across multiple tenants and under different conditions (settings).

Ticket:

PERF-709 - Getting issue details... STATUS

Summary

Load tests (multiple tenants, R/W split disabled) showed that ListApp refreshing duration is
- 1.9 min for 3 concurrent lists test (1 list refresh on each tenant);
- 3.4 min for 10 concurrent lists test (3-4 lists refresh on each tenants).
During load test for 30 lists (10 lists per tenant) some of the list refreshes failed. After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database. Details on the issue can be found here: Failed list refresh investigation.
Comparing to previous test results, Lists App refresh duration is about 20-30% higher with R/W split feature disabled.
During the 10 lists refresh test CPU utilization reached 138% for mod-fqm-manager and 118% for mod-lists. In comparison with testing with R/W split enabled, CPU utilization for mod-fqm-manager is 62% lower for testing with R/W split disabled.
Memory utilization for mod-fqm-manager increased from 37% to 49%, for mod-permissions - from 33% to 66% during the tests. No memory leak is suspected for all the modules.
Maximum DB CPU utilization reached 99% (writer instance) during 30 lists test. In comparison with multiple tenants testing, RDS CPU utilization for writer instance is 20% lower for single tenant test.

Recommendations & Jiras

PERF-732 - Getting issue details... STATUS

Test runs

Query used in lists - "Item status != Available". List refresh result is about 200K records.

Scenario

Data quantity

List App refresh

multiple tenants

R/W split disabled

tenant 1 - 1 list

tenant 2 - 1 list

tenant 3 - 1 list

tenant 1 - 3 lists

tenant 2 - 3 lists

tenant 3 - 4 lists

tenant 1 - 10 lists

tenant 2 - 10 lists

tenant 3 - 10 lists

Results

Lists App refresh, avg	3 lists	10 lists	30 lists
Poppy, multiple tenants, R/W split disabled	1.9 min	3.4 min	refresh failed for some of the lists***
Poppy, multiple tenants, R/W split enabled*	1.5 min	2.3 min	refresh failed for some of the lists***
Poppy, single tenant, R/W split enabled**	3.1 min	6.5 min	-

*Results are taken from previous test report: [Poppy] List App with multiple tenants and R/W split enabled

**Results are taken from previous test report: [Poppy] List App with single tenant and R/W split enabled

***After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database. More details about failing refresh you can find at the link: Failed list refresh investigation

Service CPU Utilization

During the 10 lists test CPU utilization reached 138% for mod-fqm-manager and 118% for mod-lists. Also, mod-permissions CPU utilization exceeded 107% during 30 lists test.

In comparison with testing with R/W split enabled, CPU utilization for mod-fqm-manager is 62% lower for testing with R/W split disabled.

(Note that "users" are synonymous and interchangeable with "lists", e.g., 30 users = 30 lists)

Memory Utilization

Memory utilization for mod-fqm-manager increased form 37% to 49%, for mod-permissions - from 33% to 66% during the tests. No memory leak is suspected for all the modules.

DB CPU Utilization

Maximum DB CPU utilization reached 99% (writer instance) during 30 lists test.

In comparison with multiple tenants testing, RDS CPU utilization for writer instance is 20% lower for single tenant test.

DB Connections

DB Load

TOP SQL

Long-running queries:

select id from [tenant]_mod_fqm_manager.drv_item_callnumber_location where lower(cast(item_status as varchar)) <> lower($1)
parameters: $1 = 'Available'

delete from list_contents where list_id=$1 and refresh_id=$2

Appendix

Infrastructure

PTF -environment pcp1

10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name API Name Memory GIB vCPUs max_connections
R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731
MSK tenant
- 4 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3

Module pcp1-pvt Fri Oct 27 08:26:47 UTC 2023	Task Def. Revision	Task Count	Mem Hard Limit	Mem Soft limit	CPU units	Xmx	MetaspaceSize	MaxMetaspaceSize	R/W split enabled
mod-inventory-storage:27.0.0	10	2	4096	3690	2048	3076	384	512	false
mod-users:19.2.0	19	2	1024	896	128	768	88	128	false
nginx-okapi:2023.06.14	8	2	1024	896	128	0	0	0	false
mod-circulation-storage:17.1.0	10	2	2880	2592	1536	1814	384	512	false
okapi:5.1.1	9	3	1684	1440	1024	922	384	512	false
mod-inventory:20.1.0	9	2	2880	2592	1024	1814	384	512	false
mod-circulation:24.0.0	10	2	2880	2592	1536	1814	384	512	false
pub-okapi:2023.06.14	8	2	1024	896	128	768	0	0	false
mod-fqm-manager:1.0.0	4	2	1024	896	128	768	88	128	false
mod-lists:1.0.0	5	2	3000	2600	128	2048	384	512	false

Methodology

Disable R/W split for mod-fqm-manager.
Create 10 lists with the query "Item status != Available" on each of three tenants to be able to run a test for up to 30 concurrent lists.
Prepare 200K item records for the query to return. Details can be found at the link: Steps for testing process#ListApp
Conduct tests with JMeter script for multiple tenants.

Folio Development Teams

[Poppy] List App with multiple tenants and R/W split disabled