Member tenants sharing local instances (Quesnelia)
Overview
This document contains the results of testing Sharing local instances(SLI) for MARC Source records.
-
PERF-904Getting issue details...
STATUS
Summary
- Duration on three tenants cs00000int_0001-cs00000int_0003 for 1 SLI process is about the same and the average value is about 16 seconds. Duration on the last tenant during test execution cs00000int_0004 is about 6 seconds. For 2 parallel SLIs duration is about 16 seconds on first cs00000int_0001 tenant and 6,5 seconds on second tenant cs00000int_0002. For 3 parallel SLIs and for 4 parallel SLIs is about 16 seconds for all except the last tenant - 6 seconds.
- Duration increased significantly if to compare with approximate 2 seconds in Poppy release.
- After adding parameter
Dinventory.sharing.di.status.poll.interval.seconds=2 the duration decreased from 6 seconds to 2,9 seconds for SLI.
- No memory leak is suspected for SLI modules. Memory consumption grew for mod-inventory to 56% during 3-rd test and didn't change.
- Maximum CPU utilization was about 15% on mod-inventory and mod-consortia-b 1VU-23%, 2VU-25%, 3VU-28%, 4VU-35%.
- RDS CPU utilization were from 8% for 1VU and up to 10% for 4VU. That is twice less than in Poppy.
- All SLI finished successfully(without erros). Error rate = 0
Recommendations and Jiras
- It's important to add parameters to task definition to reduce local instance sharing time
"name": "JAVA_OPTS",
"value": "-Dinventory.sharing.di.status.poll.interval.seconds=2"
Test results
Test 1. Virtual user working sequentially on each of the 4 tenants.
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15974 | 0.06 | 0 | 15560 | 15967 | 16369 | 16482 |
cs00000int_0002 | 100 | 16004 | 0.06 | 0 | 15594 | 16016 | 16351 | 16377 |
cs00000int_0003 | 100 | 15523 | 0.06 | 0 | 15281 | 15503 | 15760 | 15852 |
cs00000int_0004 | 100 | 5998 | 0.16 | 0 | 5714 | 5975 | 6285 | 6941 |
Test 2. Virtual users working parallel on 2 tenants.
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15958 | 0.06 | 0 | 15571 | 15945 | 16311 | 16488 |
cs00000int_0002 | 100 | 6436 | 0.15 | 0 | 5982 | 6412 | 6802 | 8082 |
Test 3. Virtual users working parallel on 3 tenants
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15984 | 0.06 | 0 | 15554 | 15984 | 16269 | 16464 |
cs00000int_0002 | 100 | 15976 | 0.06 | 0 | 15613 | 15967 | 16323 | 16457 |
cs00000int_0003 | 100 | 5985 | 0.16 | 0 | 5655 | 5951 | 6308 | 7414 |
Test 4. Virtual users working parallel on 4 tenants
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15993 | 0.06 | 0 | 15630 | 15994 | 16302 | 16548 |
cs00000int_0002 | 100 | 15920 | 0.06 | 0 | 15554 | 15875 | 16315 | 16405 |
cs00000int_0003 | 100 | 15515 | 0.06 | 0 | 15281 | 15507 | 15729 | 15755 |
cs00000int_0004 | 100 | 5956 | 0.16 | 0 | 5656 | 5930 | 6288 | 7321 |
Results after mod-inventory module task definition update ("value": "
)-Dinventory.sharing.di.status.poll.interval.seconds=2"
Test 1. Virtual user working sequentially on each of the 4 tenants.
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15980 | 0.06 | 0 | 15613 | 15967 | 16332 | 16457 |
cs00000int_0002 | 100 | 15951 | 0.06 | 0 | 15583 | 15977 | 16321 | 16354 |
cs00000int_0003 | 100 | 15522 | 0.06 | 0 | 15263 | 15504 | 15755 | 15978 |
cs00000int_0004 | 100 | 2875 | 0.33 | 0 | 2603 | 2860 | 3130 | 3215 |
Test 2. Virtual users working parallel on 2 tenants.
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15972 | 0.06 | 0 | 15592 | 15994 | 16296 | 16405 |
cs00000int_0002 | 100 | 3311 | 0.28 | 0 | 1035 | 3448 | 3739 | 4692 |
Test 3. Virtual users working parallel on 3 tenants
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15986 | 0.06 | 0 | 15643 | 15978 | 16305 | 16416 |
cs00000int_0002 | 100 | 15957 | 0.06 | 0 | 15573 | 15949 | 16264 | 16472 |
cs00000int_0003 | 100 | 2850 | 0.33 | 0 | 2579 | 2814 | 3095.2 | 4225 |
Test 4. Virtual users working parallel on 4 tenants
Tenant | TTL REQ, COUNT | Average, MS | THRGHPT, REQ/SEC | ERRORS, COUNT | MIN, MS | MEDIAN, MS | PCT95, MS | MAX, MS |
---|---|---|---|---|---|---|---|---|
cs00000int_0001 | 100 | 15965 | 0.06 | 0 | 15580 | 15964 | 16346 | 16473 |
cs00000int_0002 | 100 | 15959 | 0.06 | 0 | 15617 | 15962 | 16286 | 16434 |
cs00000int_0003 | 100 | 15529 | 0.06 | 0 | 15263 | 15538 | 15704 | 15889 |
cs00000int_0004 | 100 | 2874 | 0.16 | 0 | 2601 | 2852 | 3101 | 4433 |
Resource Utilization Test 1,2,3,4.
Below are the resource utilization graphs for all tests.
Service CPU Utilization
CPU utilization increased only during the SLI process and all modules came back to default numbers after all SLI was finished.
mod-consortia-b
Test 1 - 25%, test 2 - 27%, test 3 - 30%, test 4 - 35%
Memory Utilization
Memory usage was stable over 4 tests no memory leak is suspected for all modules, on the graph there are 10 most memory-consuming services. Mod-consortia - 74%, mod-inventory - 56%.
RDS CPU Utilization
For 1VU average RDS CPU Utilization was about 7% for all 4 tenants. 2VU ~ 8%; 3VU~9% and 4VU ~10%.
RDS Database Connections
The average number of DB connections before the test was about 500. During tests the connections spiked to 620.
Average active sessions (AAS)
Database load sliced by SQL
Errors
"error": "Sharing instance with InstanceId=[UUID] to the target tenant cs00000int. Error: Failed sending record data."
Fixed after deploy of mod-di-converter-storage:2.2.2 - MODDICONV-379Getting issue details... STATUS
Appendix
Record parameters, on each of the Тenants
QCON | Instances | Source: | Records: |
cs00000int | 1,540,307 | MARC | 1436417 |
FOLIO | 103890 | ||
cs00000int_0001 | 3,147,248 | MARC | 3042890 |
FOLIO | 104358 | ||
cs00000int_0002 | 2,306,678 | MARC | 2180113 |
FOLIO | 126565 | ||
cs00000int_0003 | 1,540,307 | MARC | 1436417 |
FOLIO | 103890 | ||
cs00000int_0004 | 1,540,307 | MARC | 1436417 |
FOLIO | 103890 | ||
cs00000int_0005 | 1,540,307 | MARC | 1436417 |
FOLIO | 103890 |
PTF -environment: QCON
- 11 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 1 instance of db.r6.xlarge database instance: Writer instance
- MSK - tenant
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- OpenSearch 2.7 domain: ptf-test
- Dedicated master nodes enabled
- Instance type r6g.large.search
- Number of nodes 3
Data nodes
- Instance type r6g.2xlarge.search
- Number of nodes 4
- Dedicated master nodes enabled
- Topics:
{ "name": "KAFKA_PRODUCER_TENANT_COLLECTION", "value": "ALL" }
Infrastructure
Additional Files
- source files for SLI CPU and Memory dashboard creation
Methodology
jMeter request to trigger instance sharing:
- CSI_POST /consortia/{consortiumId}/sharing/instances
BODY: {"sourceTenantId":"${sourceTenantId}","instanceIdentifier":"${list_id1}","targetTenantId":"${targetTenantId}","id":"${__UUID()}"}
Test 1. The shared local instances process was started from the Jmeter script for Tenant cs00000int_0001, cs00000int_0002, cs00000int_0003, and cs00000int_0004 one after another.
Test 2. The shared local instances process was started from the Jmeter script in parallel for Tenant cs00000int_0001 and cs00000int_0002.
Test 3. The shared local instances process was started from the Jmeter script in parallel for Tenant cs00000int_0001, cs00000int_0002, and cs00000int_0003.
Test 4. The shared local instances process was started from the Jmeter script in parallel for Tenant cs00000int_0001, cs00000int_0002, cs00000int_0003, and cs00000int_0004.
Add a new bullet point “All SLI finished successfully(without erros)“