Kafka Zookeeper mode - Data Import with Check-ins Check-outs (Quesnelia)[non-ECS] MSK instance type comparison
Overview
This document contains the results of testing workflows Check-in/Check-out and Data Import for MARC Bibliographic records in the Quesnelia release with a new MSK instance type. The main idea is to see how the kafka.m7g.2xlarge affects FOLIO performance. Compared results for main workflows with different instance types: kafka.m5.2xlarge against kafka.m7g.2xlarge.
Ticket: - PERF-921Getting issue details... STATUS
Summary
- Comparing kafka.m5.2xlarge against kafka.m7g.2xlarge instance type
- The main KPI for the workflows do not differ significantly (Data Import durations and CI/CO response time). During 2 hour CI/CO with Data Import tests the number of requests were similar for both MSK clusters- 287669 in m5 and 287155 in m7g MSK instance type. Duration of data import update job with 25k records is longer for 2 minutes with m7g instance type.
- MSK resources utilization. CPU decreased slowly (1% - 6%) with m7g instance type. Memory usage is on the same level.
- Resource utilization
- Memory utilization didn't differ a lot between two MSK clusters
- Average DB CPU utilization is 85% during create jobs and 87% during update jobs for tests with different MSK instance types. DB CPU utilized 15% during Check-In/Check-Out period without DI.
- Average connection count is about 850 connections for create and update jobs with CI/CO and 730 connections for CI/CO without data import for tests with different MSK instance types.
- MSK instance CPU and Disk utilization is similar in both kafka.m7g.2xlarge and kafka.m5.2xlarge.
- Deltas for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The rest modules CPU utilization deltas fluctuate under 10%.
Test Runs
Test # | MSK instance type | Scenario | Load level |
---|---|---|---|
1 | kafka.m5.2xlarge | CICO + DI MARC Bib Create | 8 users + 5K, 25K sequentially |
2 | DI MARC Bib Create | 5K, 25K sequentially | |
3 | CICO + DI MARC Bib Update | 8 users + 5K, 25K sequentially | |
4 | DI MARC Bib Update | 5K, 25K sequentially | |
5 | kafka.m7g.2xlarge | CICO + DI MARC Bib Create | 8 users + 5K, 25K sequentially |
6 | DI MARC Bib Create | 5K, 25K sequentially | |
7 | CICO + DI MARC Bib Update | 8 users + 5K, 25K sequentially | |
8 | DI MARC Bib Update | 5K, 25K sequentially |
Test Results
This table shows results of Check-In/Check-out and Data Import create and update jobs.
MSK instance: kafka.m5.2xlarge | |||||
---|---|---|---|---|---|
Job profile | File size | DI Duration without CI/CO | DI Duration with CI/CO | CI with DI Average sec | CO with DI Average sec |
PTF - Create 2 | 5k | 00:02:31 | 00:02:54 | 0.899 | 1.409 |
25k | 00:11:49 | 00:12:49 | 0.724 | 1.152 | |
PTF - Updates Success - 6 | 5k | 00:03:06 | 00:03:14 | 0.807 | 1.257 |
25k | 00:15:00 | 00:15:30 | 0.784 | 1.275 | |
MSK instance: kafka.m7g.2xlarge | |||||
Job profile | File size | DI Duration without CI/CO | DI Duration with CI/CO | CI with DI Average sec | CO with DI Average sec |
PTF - Create 2 | 5k | 00:03:05 | 00:02:39 | 0.707 | 1.104 |
25k | 00:12:03 | 00:12:08 | 0.718 | 1.129 | |
PTF - Updates Success - 6 | 5k | 00:03:36 | 00:03:34 | 0.742 | 1.124 |
25k | 00:17:05 | 00:17:33 | 0.756 | 1.148 |
Check-in/Check-out without DI
Scenario | Load level | Request | Response time, sec | Response time, sec | ||
---|---|---|---|---|---|---|
95 perc | average | 95 perc | average | |||
Circulation Check-in/Check-out (without Data import) | 8 users | Check-in | 0.669 | 0.570 | 0.720 | 0.606 |
Check-out | 1.152 | 0.960 | 1.241 | 0.969 |
Comparison
Data Import durations and Check-In/Check-Out response time comparison
- Data Import durations and CI/CO response time do not differ significantly. The number of requests in 2 hour CI/CO with Data Import tests were similar for both MSK clusters- 287669 in m5 and 287155 in m7g MSK instance type.
Job Profile | File size | DELTA, DI without CI/CO | DELTA, DI+CI/CO | DELTA, CI with DI | DELTA, CO with DI |
---|---|---|---|---|---|
PTF - Create 2 | 5k | 00:00:34 | 00:00:15 | 0.192 | 0.305 |
25k | 00:00:14 | 00:00:41 | 0.006 | 0.023 | |
PTF - Updates Success - 6 | 5k | 00:00:31 | 00:00:20 | 0.065 | 0.133 |
25k | 00:02:06 | 00:02:03 | 0.028 | 0.127 |
Check-in/Check-out without DI
Scenario | Load level | Request | Response time, sec | Response time, sec | Delta | ||
---|---|---|---|---|---|---|---|
95 perc | average | 95 perc | average | Average | |||
Circulation Check-in/Check-out (without Data import) | 8 users | Check-in | 0.669 | 0.570 | 0.720 | 0.606 | 0.036 |
Check-out | 1.152 | 0.960 | 1.241 | 0.969 | 0.009 |
MSK resource utilization (CPU)
Load scenario | Brokers | MSK instance: kafka.m5.2xlarge | MSK instance: kafka.m7g.2xlarge | Delta, % |
---|---|---|---|---|
CICO | 1 | 13.7625025 | 10.6770835 | -3.08 |
2 | 11.94791625 | 9.87916575 | -2.06 | |
CICO+DI | 1 | 38.09166625 | 31.13749875 | -6.95 |
2 | 33.82291125 | 32.53334625 | -1.28 |
Response time
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
Service CPU Utilization
Delta for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The most part of modules CPU utilization deltas fluctuate under 10%.
DI MARC BIB Create and Update + CICO
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
Service Memory Utilization
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
DB CPU Utilization
Average DB CPU utilization is 85% during create jobs and 87% during update jobs for tests with different MSK instance types. DB CPU utilized 15% during Check-In/Check-Out period without DI.
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
DB Connections
Average connection count is about 850 connections for create and update jobs with CI/CO and 730 connections for CI/CO without data import for tests with different MSK instance types.
MSK instance: kafka.m5.2xlarge
DB load
MSK instance: kafka.m5.2xlarge
Top SQL-queries:
MSK instance: kafka.m7g.2xlarge
Top SQL-queries:
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
Appendix
Infrastructure
PTF -environment qcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs max_connections db.r6g.xlarge
32 GiB 4 vCPUs 2731 - MSK ptf-mobius-testing2
- 2 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
- MSK perf-921-g2
- 2 m7g.2xlarge brokers in 2 zones
Apache Kafka version 2.8.2.tiered
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
Task count for modules mod-oa-b, mod-graphql set to 0 before test start.
Modules
Methodology/Approach
- Populate ptf-mobius-testing2 cluster with topics from tenant cluster
- Run CICO for 2 hours
- After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
- Run alone Data Imports
- Create new kafka cluster
- Populate NEW cluster with topics from tenant cluster
- Run CICO for 2 hours
- After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
- Run alone Data Imports
- Compare resource utilization of MSK and main KPI for CICO & DI
Additional/Files
Topics: