PTF - DI testing for Cornell (Iris hotfixes)

PTF - DI testing for Cornell (Iris hotfixes)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Overview

  1. In this workflow, we are checking the performance of Data Import for Cornell. This testing is done to mimic Cornell load as described in the Rally ticket https://rally1.rallydev.com/#/79944863724d/search?detail=%2Fuserstory%2F50807e3e-96eb-4ece-b24b-24665b1a4dc5&fdp=true&keywords=cornell

 

These tests were run in PTF env in icp1 hotfix-1 cluster - https://iris-cap1.int.aws.folio.org/

Following changes were made to env based on Cornell's load requirement:

1. Give more memory/CPU to Okapi, mod-srm, mod-srs, mod-inventory and mod-inventory-storage - 4x of what's already in the Task Definition

2. Make DB 4x large

3. Vertically scale up EC2 instances to m5.2xlarge

Module

Memory(Hard limit) MB

CPU

Task Def

Okapi

3456

512

#2

mod-SRM

5760

512

#4

mod-SRS

5760

512

#4

mod-inventory

7488

1024

#5

mod-inventory-storage

3456

512

#2

 

Hotfix-1

  • Backend:

    • mod-data-import-2.0.2

    • mod-source-record-storage-5.0.4

    • mod-source-record-manager-3.0.7

    • okapi-4.7.3

  • Frontend:

    • folio_data-import-4.0.3

Hotfix-2

  • Backend:

    • mod-data-import-2.0.3

    • mod-source-record-storage-5.0.5

    • mod-source-record-manager-3.0.8

    • okapi-4.7.3

    • mod-inventory-16.3.3

    • mod-inventory-storage-20.2.1

  • Frontend:

    • folio_data-import-4.0.4

 

Environment:

  • 7.2 million UChi SRS records

  • 7.2 million inventory records

  • 69 FOLIO back-end modules deployed in 149 ECS services

  • 3 okapi ECS services

  • 12 m5.2xlarge  EC2 instances

  • writer db.r5.4xlarge 1 reader db.r5.4xlarge AWS RDS instance

  • INFO logging level

Test Runs

Hotfix-1

Data Import

Test

Profile

Load

Duration

Status

I | H | I | SRS Marc

Job Id

Results/Notes

1.

Create import 25K (lone job)

25K

10:21 AM - 11:26 AM EST - 1 hour 6 minutes

Completed with Error

24960 | 24959 | 24959 | 24950

991

Job failed to create all records;

mod-inventory's CPU spiked

kafka.consumer.max.poll.records=10

2.

Create import 25K (lone job)

25K

2+ hours

Stuck at 99%

25000 | 25000 | 24998 | 25000

1024

Failed to create 2 items;

mod-inventory's CPU spiked kafka.consumer.max.poll.records=10

3.

Create import 5K (lone job)

5K

16:37 - 16:47 EDT

Completed Successfully

5000 | 5000 | 5000 | 5000

1025

 

4.

Create import 25K (lone job)

25K

16:50 - 17:32 EDT

42 minutes

Completed Successfully

25000 | 25000 | 25000| 25000

1026

mod-inventory's CPU did not spike during the import, only right after.

5.

Create import 30K (lone job)

30K

18:05 - 19:11 EDT

1 hr 6 minutes

Stuck at 99% 

30000 | 30000 | 29999 | 30000

1027

mod-inventory's CPU did not spike during the import. A dip observed about midway. events_cache messages rate spiked at the dip.

Checkin-Checkout

Ran Checkin-Checkout for 20 Users for 1 hour

Relatively stable except few requests failed.

Grafana dashboard - http://carrier-io.int.folio.ebsco.com/grafana/d/q69rYQlik/jmeter-performance?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=700&var-db_name=jmeter&var-sampler_type=All&from=1624636077093&to=1624640056950

 

Hotfix-2

Data Import

Test

Profile

Load

Duration

Status

I | H | I | SRS Marc

Job Id

Results/Notes

1.

Create import 25K (lone job)

25K

06/28 4:29 PM - 5:45 PM EDT - 1 hour 16 minutes

Completed with Error

24950 | 24950 | 24950 | 24950

1090

Job failed to create all records;

kafka.consumer.max.poll.records=10

2.

Create import 25K (lone job)

25K

06/29 2:05PM - 3:15 PM EDT - 1 hour 10 minutes

Completed

25000 | 25000 | 25000| 25000

1093

Created all records as expected. There was one sudden spike in the mod-inventory CPU.

3. 

Create import 25K (lone job)

25K

06/29 3:31PM - 4:30 PM EDT - 1 hour 1 minute

Completed

25000 | 25000 | 25000| 25000

1094

kafka.consumer.max.poll.records=10

Created all records as expected. No spikes seen.

4.

Create import 25K (lone job)

25K

06/29 17:35 - 18:23 EDT

48 minutes

Completed

25000 | 25000 | 25000| 25000

1095

kafka.consumer.max.poll.records=10

Created all records as expected. No spikes seen.

5.

Create import 50K (lone job)

50K

06/29 22:51 UTC - 06/30 00:52 UTC - 2 hours

Completed with Error

50000 | 50000 | 49999 | 50000

1096

kafka.consumer.max.poll.records=10

Job stuck at 99%. Huge mod-inventory CPU spike midway (13 minutes) after 8 minutes of very low activities. events_cache spiked multiple times.

6.

Create import 25K (with 20 users checkin/out)

25K

06/30 03:56 UTC - 

06/30 04:49 UTC

53 minutes

Completed

25000 | 25000 | 25000| 25000

1097

kafka.consumer.max.poll.records=10

7.

Create import 25K (with 40 users checkin/out)

25K

06/30 4:41 PM UTC - 5:16 PM UTC - 35 minutes

Completed

25000 | 25000 | 25000| 25000

1098

kafka.consumer.max.poll.records=10

No spikes. Okapi CPU utilization is high around 220% because checkin-checkout also running in the background.

Checkin-Checkout as background for DI above

DI Test# from above

Users

 

Req/s

Min

50th pct

75th pct

95th pct

99th pct

Max

Average

Latency

Grafana dashboard link

Results/Notes

6.

20

Total Check-in

19.495

0.356

1.397

1.957

3.831

5.999

10.34

1.695

3.4


6/29/2021 11:55PM-00:55AM EDT

High CPU utilization for Okapi 180% which is normal for 20 Users

Total Check-out

60.657

1.326

3.03

4.022

8.202

12.924

31.539

3.669

7.076

7.

40

Total Check-in

28.509

0.413

2.706

3.971

7.911

10.883

19.222

3.354

6.578

6/30/2021 12:20PM-1:40PM EDT

High CPU utilization for Okapi 245% which is normal for 40 Users

Total Check-out

86.383

1.669

6.111

8.417

17.415

27.235

53.761

7.551

14.468

Checkin-Checkout Standalone

Test

Users

 

Req/s

Min

50th pct

75th pct

95th pct

99th pct

Max

Average

Latency

Grafana dashboard link

Results/Notes

1.

20

Total Check-in

24.755

0.326

0.807

1.039

1.551

2.284

13.608

0.891

1.479

6/28/2021 2:42PM-3:50PM EDT

High CPU utilization for Okapi 140% which is normal for 20 Users

Total Check-out

75.733

1.235

1.926

2.228

3.172

4.751

12.905

2.081

3.014

2.

40

Total Check-in

36.529

0.349

1.724

2.356

4.675

6.746

13.136

2.061

4.02

6/30/2021 10:14AM-11:22AM EDT

High CPU utilization for Okapi 250% which is normal for 40 Users