Member tenants sharing local instances

Overview

This document contains the results of testing Sharing local instances(SLI) for MARC Source records. PERF-755 - Getting issue details... STATUS
After improvement, the fixed version  
mod-inventory:20.1.7-SNAPSHOT.487 was deployed on the Pcon cluster. MODINV-950 - Getting issue details... STATUS

Summary

  • Duration on all of the tenants cs00000int_0001-cs00000int_0004 for 1 SLI process is about the same and the average value is about 2 seconds.  For 2 parallel SLIs duration is about 2.2 Seconds, for 3 parallel SLIs 2.15 seconds and for 4 parallel SLIs is 2.1.
  • No memory leak is suspected for SLI modules. Memory consumption was quite low in comparison to the "before-test" state.
  • Maximal CPU utilization was about 17% on mod-inventory and mod-quick-marc. 
  • RDS CPU utilization were from 15% for 1VU and up to 20% for 4VU;

Recommendations and Jiras

  • Test SLI in parallel with other workflows

Test results

Test parameters inventory.sharing.di.status.poll.interval.seconds= 1 and inventory.sharing.di.status.poll.number = 5(default)
Test 1. 1 Virtual user working sequentially on each of the 4 tenants.

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

1000.29841081225526113719
cs00000int_00021000.38901006221825793159
cs00000int_00031000.39516022076215015233
cs00000int_00041000.4221554194021832989

Test 2. 2 Virtual users working parallel on 2 tenants.

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

1000.364610052466268916047
cs00000int_00021000.39701019221225832900

Test 3. 3 Virtual users working parallel on 3 tenants

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

1000.3344530228026413304
cs00000int_00021000.329110012383256715985
cs00000int_00031000.3852520191021602955

Test 4. 4 Virtual users working parallel on 4 tenants

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

1000.31341096235726613437
cs00000int_00021000.310973225425843237
cs00000int_00031000.3573517188421172724
cs00000int_00041000.32111098228329163145

Additional tests were performed to check the duration of SLI with the next parameters inventory.sharing.di.status.poll.interval.seconds= 2 and inventory.sharing.di.status.poll.number = 5(default)

Test 1.a 1 Virtual user working sequentially on each of the 4 tenants.

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Tenant 

TTL REQ, COUNT

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

cs00000int_0001

100225526113348387342274672
cs00000int_0002100221825793324378438204702
cs00000int_0003100207621502980318439564146
cs00000int_0004100194021832669295139264145

Test 2.a 2 Virtual users working parallel on 2 tenants.

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Tenant 

TTL REQ, COUNT

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

cs00000int_0001

100246626893190372541074922
cs00000int_0002100221225833013376239344816

Test 3.a  3 Virtual users working parallel on 2 tenants.

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Tenant 

TTL REQ, COUNT

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

cs00000int_0001

100228026413017370140204639
cs00000int_0002100238325673340375337344710
cs00000int_0003100191021602939315438914222


Resource Utilization Test 1,2,3,4. 

Below are the resource utilization graphs for all tests.

Memory Utilization

Memory usage was stable over 4 tests no memory leak is suspected for all modules, on the graph there are 10 most memory-consuming services.

Service CPU Utilization 

CPU utilization increased only during the SLI process and all modules came back to default numbers after all SLI was finished.


RDS CPU Utilization 

For 1VU avarage RDS CPU Utilization was asbout 14% for all 4 tenants. 2VU ~ 15%;  3VU~17% and 4VU ~20%.


RDS Database Connections

The average number of DB connections before the test was 450. SLI for 1VU~510; SLI for 2VU~520; SLI for 3VU~530; SLI for 4VU~540; 

Average active sessions (AAS)

Database load sliced by SQL

Errors

Failed request response.  All of the failed requests have the same errors.
{"errors":[{"message":"ERROR: duplicate key value violates unique constraint \"uq_instance_id_source_tenant_id_target_tenant_id\"\n  Detail: Key (instance_id, source_tenant_id, target_tenant_id)=(cf2a6947-e1bb-4e8f-ad43-f3ecd600b8f4, cs00000int_0004, cs00000int) already exists.","type":"-1","code":"VALIDATION_ERROR"}]}

Appendix

Environment: PCON

Record parameters, on each of the Тenants

  • Tenant cs00000int_0001: Number of shared instances  1Source
  • Tenant cs00000int_0002: Number of shared instances  1695139Source
  • Tenant cs00000int_0003: Number of shared instances  1695139 Source
  • Tenant cs00000int_0004: Number of shared instances  1695139 Source

PTF -environment ncp5 

Infrastructure

Mod-inventory:20.1.7-SNAPSHOT is a release of mod-inventory:20.1.6 + changes to add two configuration options. Parameters could be changes from task definition 

"name": "JAVA_OPTS", -Dinventory.sharing.di.status.poll.interval.seconds=2
ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
pcon-pvt

mod-source-record-storage5mod-source-record-storage:5.7.525600500020483500384512FALSE
mod-inventory12mod-inventory:20.1.7-SNAPSHOT.48722880259210241814384512FALSE
mod-di-converter-storage5mod-di-converter-storage:2.1.42102489612876888128FALSE
mod-pubsub5mod-pubsub:2.11.32153614401024922384512FALSE
mod-circulation5mod-circulation:24.0.1122880259215361814384512FALSE
mod-source-record-manager5mod-source-record-manager:3.7.725600500020483500384512FALSE
mod-quick-marc5mod-quick-marc:5.0.11228821761281664384512FALSE
nginx-okapi2nginx-okapi:2023.06.1421024896128000FALSE
okapi-b2okapi:5.1.13168414401024922384512FALSE
mod-consortia4mod-consortia:1.0.425136477610244416384512FALSE
mod-organizations3mod-organizations:1.8.02102489612870088128FALSE
mod-tags3mod-tags:2.1.02102489612876888128FALSE
mod-inventory-storage5mod-inventory-storage:27.0.424096369020483076384512FALSE

Methodology

Test 1. The shared local instances process was started from the Jmeter script for Tenant cs00000int_0001, cs00000int_0002, cs00000int_0003, and cs00000int_0004 one after another. 

Test 2. The shared local instances process was started from the Jmeter script in parallel for Tenant cs00000int_0001 and cs00000int_0002. 

Test 3. The shared local instances process was started from the Jmeter script in parallel for Tenant cs00000int_0001,  cs00000int_0002, and cs00000int_0003. 

Test 4. The shared local instances process was started from the Jmeter script in parallel for Tenant cs00000int_0001,  cs00000int_0002, cs00000int_0003, and cs00000int_0004.