Member tenants sharing local instances

Member tenants sharing local instances

Overview

This document contains the results of testing Sharing local instances(SLI) for MARC Source records. https://folio-org.atlassian.net/browse/PERF-755
After improvement, the fixed version  mod-inventory:20.1.7-SNAPSHOT.487 was deployed on the Pcon cluster. https://folio-org.atlassian.net/browse/MODINV-950

Summary

  • Duration on all of the tenants cs00000int_0001-cs00000int_0004 for 1 SLI process is about the same and the average value is about 2 seconds.  For 2 parallel SLIs duration is about 2.2 Seconds, for 3 parallel SLIs 2.15 seconds and for 4 parallel SLIs is 2.1.

  • No memory leak is suspected for SLI modules. Memory consumption was quite low in comparison to the "before-test" state.

  • Maximal CPU utilization was about 17% on mod-inventory and mod-quick-marc. 

  • RDS CPU utilization were from 15% for 1VU and up to 20% for 4VU;

Recommendations and Jiras

  • Test SLI in parallel with other workflows

Test results

Test parameters inventory.sharing.di.status.poll.interval.seconds= 1 and inventory.sharing.di.status.poll.number = 5(default)
Test 1. 1 Virtual user working sequentially on each of the 4 tenants.

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

100

0.298

4

1081

2255

2611

3719

cs00000int_0002

100

0.389

0

1006

2218

2579

3159

cs00000int_0003

100

0.395

1

602

2076

2150

15233

cs00000int_0004

100

0.422

1

554

1940

2183

2989

Test 2. 2 Virtual users working parallel on 2 tenants.

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

100

0.364

6

1005

2466

2689

16047

cs00000int_0002

100

0.397

0

1019

2212

2583

2900

Test 3. 3 Virtual users working parallel on 3 tenants

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

100

0.334

4

530

2280

2641

3304

cs00000int_0002

100

0.329

1

1001

2383

2567

15985

cs00000int_0003

100

0.385

2

520

1910

2160

2955

Test 4. 4 Virtual users working parallel on 4 tenants

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

Tenant 

TTL REQ, COUNT

THRGHPT, REQ/SEC

ERRORS, COUNT

MIN, MS

MEDIAN, MS

PCT95, MS

MAX, MS

cs00000int_0001

100

0.313

4

1096

2357

2661

3437

cs00000int_0002

100

0.31

0

973

2254

2584

3237

cs00000int_0003

100

0.357

3

517

1884

2117

2724

cs00000int_0004

100

0.321

1

1098

2283

2916

3145

Additional tests were performed to check the duration of SLI with the next parameters inventory.sharing.di.status.poll.interval.seconds= 2 and inventory.sharing.di.status.poll.number = 5(default)

Test 1.a 1 Virtual user working sequentially on each of the 4 tenants.

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Tenant 

TTL REQ, COUNT

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

cs00000int_0001

100

2255

2611

3348

3873

4227

4672

cs00000int_0002

100

2218

2579

3324

3784

3820

4702

cs00000int_0003

100

2076

2150

2980

3184

3956

4146

cs00000int_0004

100

1940

2183

2669

2951

3926

4145

Test 2.a 2 Virtual users working parallel on 2 tenants.

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Tenant 

TTL REQ, COUNT

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

cs00000int_0001

100

2466

2689

3190

3725

4107

4922

cs00000int_0002

100

2212

2583

3013

3762

3934

4816

Test 3.a  3 Virtual users working parallel on 2 tenants.

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Mod-inventory parameters

di.status.poll.interval.seconds=1

di.status.poll.number=5

di.status.poll.interval.seconds=2

di.status.poll.number=5

di.status.poll.interval.seconds=3

di.status.poll.number=5

Tenant 

TTL REQ, COUNT

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

MEDIAN, MS

PCT95, MS

cs00000int_0001

100

2280

2641

3017

3701

4020

4639

cs00000int_0002

100

2383

2567

3340

3753

3734

4710

cs00000int_0003

100

1910

2160

2939

3154

3891

4222



Resource Utilization Test 1,2,3,4. 

Below are the resource utilization graphs for all tests.

Memory Utilization

Memory usage was stable over 4 tests no memory leak is suspected for all modules, on the graph there are 10 most memory-consuming services.

Service CPU Utilization 

CPU utilization increased only during the SLI process and all modules came back to default numbers after all SLI was finished.



RDS CPU Utilization 

For 1VU avarage RDS CPU Utilization was asbout 14% for all 4 tenants. 2VU ~ 15%;  3VU~17% and 4VU ~20%.



RDS Database Connections

The average number of DB connections before the test was 450. SLI for 1VU~510; SLI for 2VU~520; SLI for 3VU~530; SLI for 4VU~540; 

Average active sessions (AAS)

Database load sliced by SQL

Errors

Failed request response.  All of the failed requests have the same errors.
{"errors":[{"message":"ERROR: duplicate key value violates unique constraint \"uq_instance_id_source_tenant_id_target_tenant_id\"\n  Detail: Key (instance_id, source_tenant_id, target_tenant_id)=(cf2a6947-e1bb-4e8f-ad43-f3ecd600b8f4, cs00000int_0004, cs00000int) already exists.","type":"-1","code":"VALIDATION_ERROR"}]}

Appendix

Environment: PCON

Record parameters, on each of the Тenants

  • Tenant cs00000int_0001: Number of shared instances  1

    695139 and not shared

    606035. 

    Source

    MARC = 2185072 and

    FOLIO = 116102;

  • Tenant cs00000int_0002: Number of shared instances  1695139 and not shared 1009666. Source MARC = 2559671 and

    FOLIO= 145134; 

  • Tenant cs00000int_0003: Number of shared instances  1695139 and not shared 800515Source MARC = 2380417 and FOLIO= 115237

  • Tenant cs00000int_0004: Number of shared instances  1695139 and not shared 787757Source MARC = 2367659 and FOLIO= 115237

PTF -environment ncp5 

Infrastructure

Mod-inventory:20.1.7-SNAPSHOT is a release of mod-inventory:20.1.6 + changes to add two configuration options. Parameters could be changes from task definition 

"name":"JAVA_OPTS", -Dinventory.sharing.di.status.poll.interval.seconds=2