Skip to end of banner
Go to start of banner

CICO and data import with decreased number of partitions

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 45 Next »

Overview

Test goal is to assess performance of CICO and data import scenarios with decreased number of partitions in Kafka topics.

Topics setup can be found here.

Ticket: PERF-400 - Getting issue details... STATUS

Summary

  1. Load tests results comparison showed that there is no significant degradation in response times with decreased number of partitions.
  2. Resource consumption of server, database and Kafka instances also didn't change with decreased number of partitions.

Test Runs 

Test #

Test Conditions

Duration 

Load generator size (recommended)Load generator Memory(GiB) (recommended)

Notes


1.

Checkin/Ckeckout with 8, 20, 25 users

30 mint3.medium3
2.Data import with 5K, 25K, 50K, 100K Create imports      x

Results

Check In, Check Out

Response Times (CICO)

10/50 Partitions

2 Partitions

Response time comparison CICO + DI

Transaction

Response time, 95 percentileDegradation, sDegradation, %
10/50 Partitions2 Partitions
Check-in Controller 8us0.555 s0.576 s0.021 s4%
Check-out Controller 8us0.899 s0.897 s-0.002 s0%
Check-in Controller 20us0.554 s0.592 s0.038 s7%
Check-out Controller 20us0.852 s0.860 s0.008 s1%
Check-in Controller 25us0.553 s0.589 s0.036 s7%
Check-out Controller 25us0.834 s0.894 s0.06 s7%
Data import 5K2m 8 s2m 24 s16 s13%
Data import 25K10 min 41 s11 min 27 s46 s7%
Data import 50K21 min 11 s19 min 16 s-115 s-9%
Data import 100K42 min 35 s40 min 24 s-131 s-5%

Instance CPU Utilization (CICO)

10/50 Partitions

2 Partitions

Service CPU Utilization (CICO)

10/50 Partitions

2 Partitions

Memory Utilization (CICO)

10/50 Partitions

mod-inventory-storage memory usage increased from 57 to 65 during the test. This behaviour was also reproduced for the tests with 2 partitions.

2 Partitions

RDS CPU Utilization (CICO)

10/50 Partitions

2 Partitions

There is a 5% increase in CPU utilization for 25 users test, but this behaviour wasn't reproduced during retesting. It can be caused by external factors.

RDS DB connections (CICO)

10/50 Partitions

2 Partitions

 

Kafka CPU load (CICO)

10/50 Partitions

2 Partitions

Database Load (CICO)

10/50 Partitions

2 Partitions

Data Import

Instance CPU Utilization (DI)

10/50 Partitions

2 Partitions


Service CPU Utilization (DI)

10/50 Partitions

2 Partitions


Memory Utilization (DI)

10/50 Partitions

2 Partitions


RDS CPU Utilization (DI)

10/50 Partitions

2 Partitions


RDS DB connections (DI)

2 Partitions

Kafka CPU (DI)

10/50 Partitions

2 Partitions

Database Load (DI)

2 Partitions

Bulk Edits

Jobs Duration comparison 

Transaction

Job durationDegradation, sDegradation, %
10/50 Partitions2 Partitions
Users 1000 records43 s44 s1 s2%
Users 2500 records1 min 49 s1 min 45 s

- 4 s

-4%
Items 1000 records3 min 8 s2 min 49 s-19 s-10%
Items 10k records22 min 44 s19 min 13 s-3 min 31 s-15%
Holdings 1000 records1 min 52 s1 min 51 s-1 s-0.8%
Holdings 10k records

11 min 14 s

10 min 46 s

-28 s

-4%

Holdings 10k

+Items 10k

+ Users 2500 

10 min 39 s

19m 10 s

1 min 42 s

10 min 31 s

18 min 56 s

1 min 40 s

-8 s

-14 s

-2 s

-1%

-1%

-2%

Holdings 1000

+Items 1000

+ Users 1000 

1 min 47 s

2 min 43 s

42 s

1 min 44s

2 min 41 s

41 s

-3 s

-2 s

-1 s

-3%

-1%

-2%

Instance CPU Utilization (Bulk Edit)

Service CPU Utilization )Bulk Edit)


Memory Utilization (Bulk Edit)

RDS CPU Utilization (Bulk Edit)

RDS DB connections (Bulk Edit)

Kafka CPU (Bulk Edit)

Database Load (Bulk Edit)

Reindexing

Reindexing of instances with the flag recreateIndex = true

Duration Comparison


10/50 Partitions2 Partitions
Reindexing 114hr 30m (1/29 4:30 UTC - 1/29 19:00 UTC)
Reindexing 2
11hr 20m (2/3 21:30 UTC - 2/4 8:50 UTC)
Reindexing 3
11hr (2/4 18:30 UTC - 2/5 5:30 UTC)

OpenSearch Graphs

Reindexing 1 

Indexing Data Rate graph 1 shows a spike of up to 126K/min for about 9 hours, then it tailed off for another 6+ hours

Indexing Data Rate graph 2 shows the tail end of the reindexing where the indexing rate drops to below 5K/min and drags out until 19:00.


Reindexing 2



Reindexing 3

Appendix

Infrastructure

PTF -environment ncp3

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 instances of db.r6.xlarge database instances: Writer & reader instances
  • MSK ptf-kakfa-3 [ kafka configurations]
    • 4 kafka.m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

Modules memory and CPU parameters:

ModulesVersionTask DefinitionRunning Tasks CPUMemory (Soft/Hard limits)MaxMetaspaceSizeXmx
mod-data-import2.6.2412561844/20485121292
mod-data-import-cs1.15.112128896/1024128768

 mod-source-record-storage

5.5.24210241440/1536512908

mod-source-record-manager

3.5.64210243688/40965122048
mod-inventory19.0.27210242592/28805121814

 mod-inventory-storage

25.0.3321024

1952/2208

5121440
mod-quick-marc2.5.0311282176/22885121664
okapi4.14.7131024

1440/1684

512922
mod-feesfines18.1.132128896/1024128768
mod-patron-blocks1.7.1421024896/1024128768
mod-pubsub2.7.0421024

1440/1536

512922
mod-authtoken2.12.032

512

1152/1440

128

922

mod-circulation-storage15.0.2321024

1440/1536

512896
mod-circulation23.3.2321024896/1024128768
mod-configuration5.9.032128896/1024128768
mod-users19.0.042128896/1024128768
mod-remote-storage1.7.1321281692/18725121178

Topics setup

TopicPartitions number
BaselineVerification

ncp3.fs09000000.circulation.check-in

102

ncp3.fs09000000.circulation.loan

102

ncp3.fs09000000.circulation.request

102

ncp3.fs09000000.data-export.job.command

502

ncp3.fs09000000.data-export.job.update

502

ncp3.fs09000000.inventory.async-migration

502

ncp3.fs09000000.inventory.authority

502

ncp3.fs09000000.inventory.bound-with

502

ncp3.fs09000000.inventory.holdings-record

502

ncp3.fs09000000.inventory.instance

502

ncp3.fs09000000.inventory.instance-contribution

502

ncp3.fs09000000.inventory.item

502

ncp3.fs09000000.search.instance-contributor

502


Methodology/Approach

  1. Conduct necessary commands to return the database to the initial state. Do this before each test run. Wait several minutes before the test start.
  2. Conduct CICO load tests with different number of users + data import.
  3. Change partitions number from 10/50 to 2 for all necessary topics.
  4. Repeat tests.
  5. Compare test results.

Grafana dashboard

CICO tests, 10/50 partitions: http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_nolana&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&from=1673881085512&to=1673890405928

CICO tests, 2 partitions: http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_nolana&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&from=1674032625543&to=1674047619658

Please note that dashboards will expire in 6 weeks since test run.



  • No labels