Kafka Zookeeper mode - Data Import with Check-ins Check-outs (Quesnelia)[non-ECS] MSK instance type comparison

Overview

This document contains the results of testing workflows Check-in/Check-out and Data Import for MARC Bibliographic records in the Quesnelia release with a new MSK instance type. The main idea is to see how the kafka.m7g.2xlarge affects FOLIO performance. Compared results for main workflows with different instance types: kafka.m5.2xlarge against kafka.m7g.2xlarge.

Ticket: PERF-921 - Getting issue details... STATUS

Summary

  • Comparing kafka.m5.2xlarge against kafka.m7g.2xlarge instance type
    • The main KPI for the workflows do not differ significantly (Data Import durations and CI/CO response time). During 2 hour CI/CO with Data Import tests the number of requests were similar for both MSK clusters- 287669 in m5 and 287155 in m7g MSK instance type. Duration of data import update job with 25k records is longer for 2 minutes with m7g instance type
    • MSK resources utilization. CPU decreased slowly (1% - 6%) with m7g instance type. Memory usage is on the same level.
  • Resource utilization
    • Memory utilization didn't differ a lot between two MSK clusters
    • Average DB CPU utilization is 85% during create jobs and 87% during update jobs for tests with different MSK instance types. DB CPU utilized 15% during Check-In/Check-Out period without DI.
    • Average connection count is about 850 connections for create and update jobs with CI/CO and 730 connections for CI/CO without data import for tests with different MSK instance types.
    • MSK instance CPU and Disk utilization is similar in both kafka.m7g.2xlarge and kafka.m5.2xlarge.
    • Deltas for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The rest modules CPU utilization deltas fluctuate under 10%. 

Test Runs 

Test #

MSK instance type

Scenario

Load level
1kafka.m5.2xlargeCICO + DI MARC Bib Create 8 users + 5K, 25K sequentially
2DI MARC Bib Create5K, 25K sequentially
3CICO + DI MARC Bib Update 8 users + 5K, 25K sequentially
4DI MARC Bib Update5K, 25K sequentially
5kafka.m7g.2xlargeCICO + DI MARC Bib Create 8 users + 5K, 25K sequentially
6DI MARC Bib Create5K, 25K sequentially
7CICO + DI MARC Bib Update 8 users + 5K, 25K sequentially
8DI MARC Bib Update5K, 25K sequentially

Test Results

This table shows results of Check-In/Check-out and Data Import create and update jobs.

MSK instance: kafka.m5.2xlarge
Job profileFile sizeDI Duration without CI/CODI Duration with CI/COCI with DI Average secCO with DI Average sec
PTF - Create 25k00:02:3100:02:540.8991.409
25k00:11:4900:12:490.7241.152
PTF - Updates Success - 65k00:03:0600:03:140.8071.257
25k00:15:0000:15:300.7841.275
MSK instance: kafka.m7g.2xlarge
Job profileFile sizeDI Duration without CI/CODI Duration with CI/COCI with DI Average secCO with DI Average sec
PTF - Create 25k00:03:0500:02:390.7071.104
25k00:12:0300:12:080.7181.129
PTF - Updates Success - 65k00:03:3600:03:340.7421.124
25k00:17:0500:17:330.7561.148

Check-in/Check-out without DI

ScenarioLoad levelRequest

Response time, sec
MSK instance: kafka.m5.2xlarge

Response time, sec
MSK instance: kafka.m7g.2xlarge

95 percaverage95 percaverage
Circulation Check-in/Check-out (without Data import)8 usersCheck-in0.6690.5700.7200.606
Check-out1.1520.9601.2410.969

Comparison

Data Import durations and Check-In/Check-Out response time comparison

  • Data Import durations and CI/CO response time do not differ significantly. The number of requests in 2 hour CI/CO with Data Import tests were similar for both MSK clusters- 287669 in m5 and 287155 in m7g MSK instance type.
Job ProfileFile sizeDELTA, DI without CI/CODELTA, DI+CI/CODELTA, CI with DIDELTA, CO with DI
PTF - Create 25k00:00:3400:00:150.1920.305
25k00:00:1400:00:410.0060.023
PTF - Updates Success - 65k00:00:3100:00:200.0650.133
25k00:02:0600:02:030.0280.127

Check-in/Check-out without DI

ScenarioLoad levelRequest

Response time, sec
MSK instance: kafka.m5.2xlarge

Response time, sec
MSK instance: kafka.m7g.2xlarge


Delta

95 percaverage95 percaverageAverage
Circulation Check-in/Check-out (without Data import)8 usersCheck-in0.6690.5700.7200.6060.036
Check-out1.1520.9601.2410.9690.009

MSK resource utilization (CPU)

Load scenarioBrokersMSK instance: kafka.m5.2xlargeMSK instance: kafka.m7g.2xlargeDelta, %
CICO113.762502510.6770835-3.08
211.947916259.87916575-2.06
CICO+DI138.0916662531.13749875-6.95
233.8229112532.53334625-1.28

Response time

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge

Service CPU Utilization

Delta for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The most part of modules CPU utilization deltas fluctuate under 10%. 

 MSK instance: kafka.m5.2xlarge vs MSK instance: kafka.m7g.2xlarge
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlargeDELTA
ModuleCPU (CICO + 25k Create)CPU (CICO + 25k Update)
ModuleCPU (CICO + 25k Create)CPU (CICO + 25k Update)Delta, CreateDelta, Update
mod-inventory-b110.54145.5
mod-inventory-b115.21136.944.67-8.56
mod-quick-marc-b90.64102.38
mod-quick-marc-b95.1596.44.51-5.98
mod-di-converter-storage-b78.09121.08
mod-di-converter-storage-b81.26100.433.17-20.65
nginx-okapi64.198.19
nginx-okapi70.5888.946.48-9.25
okapi-b39.1458.73
okapi-b38.8950.55-0.25-8.18
mod-source-record-storage-b28.0644.84
mod-source-record-storage-b31.6139.133.55-5.71
mod-users-b23.4120.28
mod-users-b23.622.120.191.84
mod-inventory-storage-b20.1724.74
mod-inventory-storage-b21.3719.91.2-4.84
mod-source-record-manager-b18.919.54
mod-feesfines-b18.289.11-0.62-10.43
mod-feesfines-b17.748.11
mod-configuration-b17.610.52-0.142.41
mod-configuration-b14.5510.3
mod-source-record-manager-b17.3918.272.847.97
mod-dcb-b12.311.91
mod-authtoken-b17.0413.374.741.46
mod-authtoken-b7.6711.87
mod-dcb-b13.2312.335.560.46
mod-search-b7.326
mod-search-b7.951.830.63-4.17
mod-pubsub-b6.356.8
mod-pubsub-b6.826.490.47-0.31
mod-entities-links-b3.582.26
pub-okapi3.563.64-0.021.38
pub-okapi3.423.4
mod-circulation-storage-b3.352.7-0.07-0.7
mod-patron-b2.842.77
mod-patron-b2.722.79-0.120.02
mod-circulation-storage-b2.832.91
mod-entities-links-b2.242.23-0.59-0.68
mod-data-import-b2.041.65
mod-circulation-b1.981.8-0.060.15
mod-circulation-b1.921.6
mod-data-import-b1.761.88-0.160.28
edge-patron-b1.151.16
edge-patron-b1.131.16-0.020
mod-patron-blocks-b0.990.81
mod-patron-blocks-b0.971-0.020.19
mod-users-bl-b0.852.51
mod-users-bl-b0.680.68-0.17-1.83
pub-edge0.070.07
pub-edge0.060.06-0.01-0.01

DI MARC BIB Create and Update + CICO

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge

Service Memory Utilization

 MSK instance: kafka.m5.2xlarge vs MSK instance: kafka.m7g.2xlarge
ModuleMemory (kafka.m5.2xlarge)Memory (kafka.m7g.2xlarge)Delta
mod-dcb-b68.8174.375.56
mod-inventory-b68.2370.812.58
mod-users-b50.1750.370.2
mod-di-converter-storage-b48.6246.7-1.92
mod-feesfines-b45.5645.44-0.12
mod-inventory-storage-b45.3233.4-11.92
mod-source-record-storage-b44.2755.5311.26
okapi-b41.8542.50.65
mod-data-import-b41.4243.552.13
mod-patron-blocks-b41.0442.381.34
mod-search-b40.5545.585.03
mod-users-bl-b39.8245.365.54
mod-configuration-b38.7838.68-0.1
mod-source-record-manager-b38.4541.913.46
mod-pubsub-b36.8635.94-0.92
mod-quick-marc-b31.2542.611.35
mod-patron-b31.1930.52-0.67
mod-entities-links-b27.1234.497.37
mod-authtoken-b26.1727.321.15
mod-circulation-b24.1725.150.98
edge-patron-b22.7722.38-0.39
mod-circulation-storage-b20.0222.342.32
nginx-okapi4.694.58-0.11
pub-okapi4.524.46-0.06
pub-edge4.464.41-0.05

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge

DB CPU Utilization

Average DB CPU utilization is 85% during create jobs and 87% during update jobs for tests with different MSK instance types. DB CPU utilized 15% during Check-In/Check-Out period without DI.

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge


DB Connections

Average connection count is about 850 connections for create and update jobs with CI/CO and 730 connections for CI/CO without data import for tests with different MSK instance types.

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge

MSK instance resource utilization

 MSK resources table

MSK resource utilization (CPU)

Load scenarioBrokersMSK instance: kafka.m5.2xlargeMSK instance: kafka.m7g.2xlargeDelta, %
CICO113.762502510.6770835-3.08
211.947916259.87916575-2.06
CICO+DI138.0916662531.13749875-6.95
233.8229112532.53334625-1.28

MSK resource utilization (DIsk) was 4,6% with kafka.m5.2xlarge and 4,3% with kafka.m7g.2xlarge which may be neglected.

Disk usage by broker

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge

CPU (User) usage by broker

MSK instance: kafka.m5.2xlarge

MSK instance: kafka.m7g.2xlarge

DB load

MSK instance: kafka.m5.2xlarge

Top SQL-queries:

MSK instance: kafka.m7g.2xlarge


Top SQL-queries:

insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

Appendix

Infrastructure

PTF -environment qcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database  instance, writer

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731
  • MSK ptf-mobius-testing2
    • 2 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2
  • MSK perf-921-g2
    • m7g.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.2.tiered

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2

Task count for modules mod-oa-b, mod-graphql set to 0 before test start.

Modules

 All qcp1 modules
ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
qcp1-pvt








Tue Jun 04 07:31:53 UTC 2024








mod-remote-storage4mod-remote-storage:3.2.024920447210243960512512
mod-ncip4mod-ncip:1.14.42102489612876888128
mod-finance-storage4mod-finance-storage:8.6.021024896102470088128
mod-agreements4mod-agreements:7.0.0215921488128000
mod-ebsconet4mod-ebsconet:2.2.0212481024128700128256
mod-organizations4mod-organizations:1.9.02102489612870088128
mod-consortia2mod-consortia:1.1.023072204812820485121024
edge-sip22edge-sip2:3.2.0-SNAPSHOT.2092102489612876888128
mod-serials-management4mod-serials-management:1.0.02248023121281792384512
mod-settings4mod-settings:1.0.32102489620076888128
mod-data-import7mod-data-import:3.1.01204818442561292384512
edge-dematic4edge-dematic:2.2.01102489612876888128
mod-search4mod-search:3.2.0225922480204814405121024
mod-inn-reach2mod-inn-reach:3.2.0-SNAPSHOT.86236003240102428805121024
mod-tags4mod-tags:2.2.02102489612876888128
edge-courses4edge-courses:1.4.02102489612876888128
mod-authtoken5mod-authtoken:2.15.121440115251292288128
mod-inventory-update4mod-inventory-update:3.3.02102489612876888128
mod-notify4mod-notify:3.2.02102489612876888128
mod-configuration4mod-configuration:5.10.02102489612876888128
mod-orders-storage4mod-orders-storage:13.7.02102489651270088128
edge-caiasoft4edge-caiasoft:2.2.02102489612876888128
mod-login-saml4mod-login-saml:2.8.02102489612876888128
mod-erm-usage-harvester4mod-erm-usage-harvester:4.5.02102489612876888128
mod-licenses4mod-licenses:6.0.02248023121281792384512
mod-gobi4mod-gobi:2.8.02102489612870088128
mod-password-validator4mod-password-validator:3.2.0214401298128768384512
mod-bulk-operations4mod-bulk-operations:2.0.023072260010241536384512
mod-fqm-manager4mod-fqm-manager:2.0.12300026001282048384512
edge-dcb4edge-dcb:1.1.02102489612876888128
mod-graphql5mod-graphql:1.12.12102489612876888128
mod-finance4mod-finance:4.9.02102489612870088128
mod-erm-usage4mod-erm-usage:4.7.02102489612876888128
mod-batch-print5mod-batch-print:1.1.02102489612876888128
mod-copycat4mod-copycat:1.6.02102451212876888128
mod-lists4mod-lists:2.0.02300026001282048384512
mod-entities-links5mod-entities-links:3.0.0225922480400144001024
mod-permissions8mod-permissions:6.5.02168415445121024384512
pub-edge3pub-edge:2023.06.142102489612876800
mod-orders4mod-orders:12.8.022048144010241024384512
edge-patron4edge-patron:5.1.02102489625676888128
edge-ncip4edge-ncip:1.9.22102489612876888128
edge-inn-reach2edge-inn-reach:3.1.1-SNAPSHOT.452102489612876888128
mod-users-bl4mod-users-bl:7.7.021440115251292288128
mod-oa2mod-oa:2.1.0-SNAPSHOT.622102489612876888128
mod-inventory-storage4mod-inventory-storage:27.1.024096369020483076384512
mod-invoice5mod-invoice:5.8.021440115251292288128
mod-user-import4mod-user-import:3.8.02102489612876888128
mod-sender5mod-sender:1.12.02102489612876888128
edge-oai-pmh4edge-oai-pmh:2.9.021512136010241440384512
mod-data-export-worker4mod-data-export-worker:3.2.123072204810242048384512
mod-rtac4mod-rtac:3.6.02102489612876888128
mod-circulation-storage4mod-circulation-storage:17.2.022880259215361814384512
mod-calendar4mod-calendar:3.1.02102489612876888128
mod-source-record-storage4mod-source-record-storage:5.8.025600500020483500384512
mod-event-config4mod-event-config:2.7.02102489612876888128
mod-courses4mod-courses:1.4.102102489612876888128
mod-circulation-item4mod-circulation-item:1.0.021024896128000
mod-inventory4mod-inventory:20.2.022880259210241814384512
mod-email4mod-email:1.17.02102489612876888128
mod-pubsub4mod-pubsub:2.13.02153614401024922384512
mod-circulation4mod-circulation:24.2.022880259215361814384512
mod-di-converter-storage4mod-di-converter-storage:2.2.02102489612876888128
edge-rtac4edge-rtac:2.7.12102489612876888128
edge-orders4edge-orders:3.0.02102489612876888128
mod-users5mod-users:19.3.12102489612876888128
mod-template-engine4mod-template-engine:1.20.02102489612876888128
mod-patron-blocks4mod-patron-blocks:1.10.021024896102476888128
mod-audit4mod-audit:2.9.02102489612876888128
edge-fqm4edge-fqm:2.0.02102489612876888128
mod-source-record-manager5mod-source-record-manager:3.9.0-SNAPSHOT.33025600500020483500384512
nginx-edge3nginx-edge:2023.06.1421024896128000
mod-quick-marc4mod-quick-marc:5.1.01228821761281664384512
nginx-okapi3nginx-okapi:2023.06.1421024896128000
okapi-b4okapi:5.3.03168414401024922384512
mod-feesfines4mod-feesfines:19.1.02102489612876888128
mod-invoice-storage4mod-invoice-storage:5.8.021872153610241024384512
mod-dcb5mod-dcb:1.1.02102489612876888128
mod-service-interaction4mod-service-interaction:4.0.12204818442561290384512
mod-data-export13mod-data-export:5.0.412048
1844
2048000
mod-patron4mod-patron:6.1.02102489612876888128
mod-oai-pmh4mod-oai-pmh:3.13.024096369020483076384512
edge-connexion4edge-connexion:1.2.02102489612876888128
mod-kb-ebsco-java4mod-kb-ebsco-java:4.0.02102489612876888128
mod-notes4mod-notes:5.2.021024896128952384512
mod-data-export-spring4mod-data-export-spring:3.2.01204818442561536384512
mod-organizations-storage4mod-organizations-storage:4.7.02102489612870088128
mod-login4mod-login:7.11.02144012981024768384512
pub-okapi3pub-okapi:2023.06.142102489612876800
mod-eusage-reports4mod-eusage-reports:2.1.12102489612876888128

Methodology/Approach

  • Populate ptf-mobius-testing2 cluster with topics from tenant cluster
  • Run CICO for 2 hours
  • After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
  • Run alone Data Imports
  • Create new kafka cluster
  • Populate NEW cluster with topics from tenant cluster
  • Run CICO for 2 hours
  • After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
  • Run alone Data Imports
  • Compare resource utilization of MSK and main KPI for CICO & DI

Additional/Files

Topics: