Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Table of Contents
Overview

...

This document contains the results of testing workflows Check-in/Check-out and Data Import for MARC Bibliographic records in the Quesnelia release with a new MSK instance type where zookeeper instances are not required. The main idea is to see how the kafka.m7g.2xlarge with KRaft mode affects FOLIO performance. Compared results for main workflows with different instance types: kafka.m5.2xlarge against kafka.m7g.2xlarge.

...

  • Comparing kafka.m5.2xlarge, zookeeper metadata mode against kafka.m7g.2xlarge, KRaft metadata mode
    • Tests with KRaft mode enabled utilize less CPU resources by brokers ((5% at least, for some broker - 13%) CPU by brokers during CI/CO and during DI + CI/CO and at the same time it is the CPU utilizations are more balanced across the brokers compared to zookeeper modeMain workflows KPIs do not degrade in tests with Check-in
    • The previous test results  * showed that the performance actually the same for m5 and m7g clusters so m7g and m7g KRaft mode should perform the same way as well. Results for comparison **
    • Main workflows KPIs do not degrade in tests with Check-in/Check-out with Data import
    • No memory leaks. No errors during tests.
  • Resource utilization

      Test Runs 

      ...

      Test #

      ...

      Scenario

      ...

      Test Results

      This table shows results of Check-In/Check-out and Data Import create and update jobs.

      ...

      Check-in/Check-out without DI

      ...

        • Service memory usage doesn't differ in tests with both MSK cluster
        • Service CPU utilization in Zookeeper mode: mod-inventory-b - 147%, mod-quick-marc-b - 81%, mod-di-converter-storage-b - 116%, nginx-okapi - 79% the rest of modules utilized less than 50%
        • Service CPU utilization in KRaft mode: mod-inventory-b - 133%, mod-quick-marc-b - 75%, mod-di-converter-storage-b - 103%, nginx-okapi - 75% the rest of modules utilized less than 50%
        • DB CPU was on level of 90% during tests with both MSK cluster

      *Kafka Zookeeper mode - Data Import with Check-ins Check-outs (Quesnelia)[non-ECS] MSK instance type comparison

      **Kafka Zookeeper Mode vs KRaft Mode MSK - instance type comparison

      Test Runs 

      Test #

      MSK instance type

      Scenario

      Load level
      1kafka.m5.2xlargeCICO + DI MARC Bib Create 8 users + 5K, 25K sequentially
      2DI MARC Bib Create5K, 25K sequentially
      3CICO + DI MARC Bib Update 8 users + 5K, 25K sequentially
      4DI MARC Bib Update5K, 25K sequentially
      5kafka.m7g.2xlargeCICO + DI MARC Bib Create 8 users + 5K, 25K sequentially
      6DI MARC Bib Create5K, 25K sequentially
      7CICO + DI MARC Bib Update 8 users + 5K, 25K sequentially
      8DI MARC Bib Update5K, 25K sequentially

      Test Results

      This table shows results of Check-In/Check-out and Data Import create and update jobs.

      MSK instance: kafka.m5.2xlarge, metadata mode - ZooKeeper
      Job profileFile sizeDI Duration without CI/CODI Duration with CI/COCI with DI Average secCO with DI Average sec
      PTF - Create 25k00:03:4500:02:440.7361.16

      25k00:14:4000:13:360.7871.176
      PTF - Updates Success - 65k00:04:4300:04:180.7641.153

      25k00:20:2100:21:250.7671.179
      MSK instance: kafka.m7g.2xlarge, metadata mode - KRaft
      Job profileFile sizeDI Duration without CI/CODI Duration with CI/COCI with DI Average secCO with DI Average sec
      PTF - Create 25k00:02:4900:02:390.7651.118

      25k00:13:3100:12:040.7771.186
      PTF - Updates Success - 65k00:04:3600:04:310.7061.095

      25k00:24:0700:21:500.741.16


      Check-in/Check-out without DI

      ScenarioLoad levelRequest

      Response time, sec
      MSK instance: kafka.m5.2xlarge

      Response time, sec
      MSK instance: kafka.m7g.2xlarge

      95 percaverage95 percaverage
      Circulation Check-in/Check-out (without Data import)8 usersCheck-in0.6950.5870.6950.583
      Check-out1.1480.9581.1510.944

      Comparison

      Data Import durations and Check-In/Check-Out response time comparison

      • Data import durations fluctuate within a 10% range of the baseline (tests with Zookeeper metadata mode)
      • Response times of CI/CO with Data import do not differ in both MSK clusters
      Job ProfileFile sizeDELTA, DI without CI/CODELTA, DI+CI/CODELTA, CI with DIDELTA, CO with DI
      PTF - Create 25k00:00:5600:00:05-0.0290.042
      25k00:01:0900:01:320.01-0.01
      PTF - Updates Success - 65k00:00:07-00:00:130.0580.058
      25k-00:03:46-00:00:250.0270.019


      Check-in/Check-out without DI

      • Check-in/Check-out perform the same in both MSK clusters. The difference of response times is so small that it can be neglected.
      ScenarioLoad levelRequest

      Response time, sec
      MSK instance: kafka.m5.2xlarge

      Response time, sec
      MSK instance: kafka.m7g.2xlarge


      Delta

      95 percaverage95 percaverageAverage
      Circulation Check-in/Check-out (without Data import)8 usersCheck-in0.6950.5870.6950.5830.004

      Check-out1.1480.9581.1510.9440.014


      MSK resource utilization (CPU)

      Load scenarioBrokersMSK instance: kafka.m5.2xlargeMSK instance: kafka.m7g.2xlargeDelta, %
      CICO1139-4
      2139-4
      CICO+DI14532-13
      23430-4

      Response time

      MSK instance: kafka.m5.2xlarge

      Image Added

      MSK instance: kafka.m7g.2xlarge

      Image Added

      Service CPU Utilization

      CPU utilization table

      Expand
      titleMSK instance: kafka.m5.2xlarge vs MSK instance: kafka.m7g.2xlarge


      MSK instance: kafka.m5.2xlarge
      MSK instance: kafka.m7g.2xlarge
      ModuleCPU (CICO + 25k Create)CPU (CICO + 25k Update)
      ModuleCPU (CICO + 25k Create)CPU (CICO + 25k Update)
      mod-inventory-b107.84147.17
      mod-inventory-b139.1133.42
      mod-quick-marc-b79.9881.45
      mod-di-converter-storage-b103.4996.49
      mod-di-converter-storage-b75.12116.01
      mod-quick-marc-b75.4572.77
      nginx-okapi51.779.05
      nginx-okapi75.3373.68
      okapi-b27.6543.17
      okapi-b41.7451.2
      mod-source-record-storage-b24.8239.79
      mod-source-record-storage-b38.1134.22
      mod-inventory-storage-b18.5220.75
      mod-inventory-storage-b23.1326.1
      mod-source-record-manager-b16.9418.05
      mod-source-record-manager-b17.1616.33
      mod-dcb-b8.037.8
      mod-users-b9.1821.82
      mod-search-b7.871.44
      mod-dcb-b8.329.84
      mod-pubsub-b7.327.3
      mod-search-b7.198.53
      mod-users-b6.326.23
      mod-pubsub-b4.325.66
      mod-entities-links-b3.862.27
      mod-configuration-b3.2410.32
      mod-configuration-b3.613.4
      mod-oa-b2.953.35
      mod-patron-b2.862.66
      mod-patron-b2.862.4
      mod-authtoken-b2.862.13
      mod-feesfines-b2.519.08
      mod-oa-b2.82.86
      mod-authtoken-b2.1712.71
      mod-feesfines-b2.32.15
      mod-entities-links-b2.151.81
      mod-circulation-storage-b2.012.15
      mod-circulation-storage-b2.012.9
      mod-data-import-b1.61.72
      mod-data-import-b1.61.58
      edge-patron-b1.081.08
      edge-patron-b1.131.02
      mod-users-bl-b0.530.52
      mod-users-bl-b0.611.11
      mod-patron-blocks-b0.470.43
      mod-circulation-b0.552.09
      mod-circulation-b0.350.37
      mod-patron-blocks-b0.410.95
      pub-okapi0.140.15
      pub-okapi0.183.98
      pub-edge0.070.07
      pub-edge0.050.12


      DI MARC BIB Create and Update + CICO

      MSK instance: kafka.m5.2xlarge

      ...

      Image Added

      MSK instance:

      ...

       kafka.m7g.2xlarge

      ...

      Comparison

      Data Import durations and Check-In/Check-Out response time comparison

      • Data import durations fluctuate within a 10% range of the baseline (tests with Zookeeper metadata mode)
      • Response times of CI/CO with Data import do not differ in both MSK clusters

      ...

      Check-in/Check-out without DI

      • Check-in/Check-out perform the same in both MSK clusters. The difference of response times is so small that it can be neglected.

      ...

      Response time, sec
      MSK instance: kafka.m5.2xlarge

      Response time, sec
      MSK instance: kafka.m7g.2xlarge

      ...

      Delta

      ...

      MSK resource utilization (CPU)

      ...

      Response time

      MSK instance: kafka.m5.2xlarge

      MSK instance: kafka.m7g.2xlarge

      Service CPU Utilization

      Delta for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The most part of modules CPU utilization deltas fluctuate under 10%. 

      ...

      titleMSK instance: kafka.m5.2xlarge vs MSK instance: kafka.m7g.2xlarge

      DI MARC BIB Create and Update + CICO

      MSK instance: kafka.m5.2xlarge

      MSK instance: kafka.m7g.2xlarge

      Service Memory Utilization

      ...

      Image Added


      Service Memory Utilization

      Expand
      titleMSK instance: kafka.m5.2xlarge vs MSK instance: kafka.m7g.2xlarge
      • The comparison of memory resource utilization revealed no difference between tests
      ModuleMemory (kafka.m5.2xlarge)Memory (kafka.m7g.2xlarge)Delta
      mod-oa-b80.7--
      mod-dcb-b74.6174.770.16
      mod-inventory-b59.6159.630.02
      mod-data-import-b57.7557.770.02
      mod-users-b53.2253.1-0.12
      okapi-b49.6549.680.03
      mod-di-converter-storage-b49.6149.670.06
      mod-search-b47.97480.03
      mod-source-record-storage-b46.3846.380
      mod-users-bl-b45.8745.82-0.05
      mod-feesfines-b44.244.03-0.17
      mod-patron-blocks-b42.9142.7-0.21
      mod-configuration-b39.7639.73-0.03
      mod-source-record-manager-b38.7138.7-0.01
      mod-quick-marc-b36.9536.990.04
      mod-pubsub-b36.1936.320.13
      mod-entities-links-b30.5630.560
      mod-inventory-storage-b30.5130.49-0.02
      mod-patron-b30.1930.190
      mod-circulation-storage-b28.9828.980
      mod-authtoken-b27.3827.420.04
      mod-circulation-b2525.010.01
      edge-patron-b23.1623.160
      nginx-okapi4.694.690
      pub-okapi4.464.460
      pub-edge4.354.350


      MSK instance: kafka.m5.2xlarge

      Image Added

      MSK instance: kafka.m7g.2xlarge

      Image Added

      DB CPU Utilization

      Average DB CPU utilization is 85% 90% during both create jobs and 87% during update jobs for tests with different MSK instance types. DB CPU utilized 15% during Check-In/Check-Out period without DI.

      MSK instance: kafka.m5.2xlarge

      Image Added

      MSK instance: kafka.m7g.2xlarge

      Image Added


      DB Connections

      Average connection count is about 850 900 connections for create and 860 connections for update jobs with CI/CO and 730 . 770 connections for CI/CO without data import for tests with different MSK instance types.

      MSK instance: kafka.m5.2xlarge

      Image Added

      MSK instance: kafka.m7g.2xlarge

      Image Added

      MSK instance resource utilization

      Expand
      titleMSK resources table

      MSK resource utilization (CPU)

      • Tests with KRaft mode enabled utilize less CPU resources during CI/CO and during DI + CI/CO and the same time it is more balanced compared to zookeeper mode
      • The difference is 5% at least. For some brokers the difference is 13%.
      Load scenarioBrokersMSK instance: kafka.m5.2xlargeMSK instance: kafka.m7g.2xlargeDelta, %
      CICO1139-4
      2139-4
      CICO+DI14532-13
      23430-4

      MSK resource utilization (DIsk) was growing gradually during tests with kafka.m5.2xlarge to 10%

      Disk usage by broker

      MSK instance: kafka.m5.2xlarge

      MSK instance: kafka.m7g.2xlarge

      CPU (User) usage by broker

      MSK instance: kafka.m5.2xlarge

      MSK instance: kafka.m7g.2xlarge

      ...

      MSK instance: kafka.m5.2xlarge

      Image Added

      Image Added


      Top SQL-queries:

      Image Added

      Image Added

      MSK instance: kafka.m7g.2xlarge

      Image Added

      Image Added


      Top SQL-queries:

      Image Added

      Image Added



      Appendix

      Infrastructure

      PTF -environment qcp1

      • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
      • 1 database  instance, writer

        NameMemory GIBvCPUsmax_connections

        db.r6g.xlarge

        32 GiB4 vCPUs2731


      • MSK ptf-mobius-testing2
        • 2 m5.2xlarge brokers in 2 zones (total 2 brokers)
        • Apache Kafka version 2.8.0

        • EBS storage volume per broker 300 GiB

        • auto.create.topics.enable=true
        • log.retention.minutes=480
        • default.replication.factor=2
        • revision - 2
        • metadata mode - ZooKeeper
        • Total topics: 1534
        • Total partitions: 12155
      • MSK ptf-KRaft-mode
        • m7g.2xlarge brokers in 2 zones (total 2 brokers)
        • Apache Kafka version 3.7.x

        • EBS storage volume per broker 300 GiB

        • auto.create.topics.enable=true
        • log.retention.minutes=480
        • default.replication.factor=3
        • revision - 26
        • metadata mode - KRaft
        • Total topics: 1474
        • Total partitions: 11909

      Task count for module mod-graphql set to 0 before test start.

      ...

      • Populate ptf-mobius-testing2 cluster with topics from tenant cluster
      • Run CICO for 2 hours
      • After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
      • Run alone Data Imports
      • Create new kafka cluster
      • Populate NEW cluster with topics from tenant cluster
      • Run CICO for 2 hours
      • After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
      • Run alone Data Imports
      • Compare resource utilization of MSK and main KPI for CICO & DI

      Additional/Files

      Topics:

      View file
      nameptf-kafka-tenantCluster-topics_2replicationfactor_BU.csv
      height250

      Excel raw data: