Overview
This document contains the results of testing workflows Check-in/Check-out and Data Import for MARC Bibliographic records in the Quesnelia release with a new MSK instance type. The main idea is to see how the m7g series affects FOLIO performance. Compared results for main workflows with different instance types: kafka.m5.2xlarge against kafka.m7g.2xlarge.
Ticket: - PERF-921Getting issue details... STATUS
Summary
- Comparing kafka.m5.2xlarge against kafka.m7g.2xlarge instance type
- Data Import durations and CI/CO response time do not differ significantly. The number of requests during 2 hour CI/CO with Data Import test stayed the same 287669 in m5 and 287155 in m7g MSK instance type.
- Resource utilization
- Memory utilization didn't differ a lot between two MSK clusters
- Disk consume less resources in both (idle and CICO+DI) scenarios with m7g instance type. CPU utilization the same in idle state and from 3% to 18% less under load with m7g instance type.
- Delta for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The most part of modules CPU utilization deltas fluctuate under 10%.
- Average DB CPU usage for both MSK clusters during data import is 85% during create jobs and 87& during Update jobs. Check-In/Check-Out period without DI - 15%.
- Average connection count for both MSK clusters during data import is about 850 connections for create and update jobs with CI/CO. And 730 connections for CI/CO without data import
- MSK instance CPU and Disk utilization remain on the same level in kafka.m7g.2xlarge against kafka.m5.2xlarge or even decreased.
Test Runs
Test # | MSK instance type | Scenario | Load level |
---|---|---|---|
1 | kafka.m5.2xlarge | CICO + DI MARC Bib Create | 8 users + 5K, 25K sequentially |
2 | DI MARC Bib Create | 5K, 25K sequentially | |
3 | CICO + DI MARC Bib Update | 8 users + 5K, 25K sequentially | |
4 | DI MARC Bib Update | 5K, 25K sequentially | |
5 | kafka.m7g.2xlarge | CICO + DI MARC Bib Create | 8 users + 5K, 25K sequentially |
6 | DI MARC Bib Create | 5K, 25K sequentially | |
7 | CICO + DI MARC Bib Update | 8 users + 5K, 25K sequentially | |
8 | DI MARC Bib Update | 5K, 25K sequentially |
Test Results
This table shows results of Check-In/Check-out and Data Import create and update jobs.
The only difference between tests is MSK cluster instance type. Cluster ptf-mobius-testing2 has kafka.m5.2xlarge and cluster PERF-921 has kafka.m7g.2xlarge
MSK instance: kafka.m5.2xlarge | File size | DI Duration without CI/CO | DI Duration | CI Average sec | CO Average sec |
---|---|---|---|---|---|
Create | 5k | 00:02:31 | 00:02:54 | 0.899 | 1.409 |
25k | 00:11:49 | 00:12:49 | 0.724 | 1.152 | |
Update | 5k | 00:03:06 | 00:03:14 | 0.807 | 1.257 |
25k | 00:15:00 | 00:15:30 | 0.784 | 1.275 | |
MSK instance: kafka.m7g.2xlarge | |||||
Create | 5k | 00:03:05 | 00:02:39 | 0.707 | 1.104 |
25k | 00:12:03 | 00:12:08 | 0.718 | 1.129 | |
Update | 5k | 00:03:36 | 00:03:34 | 0.742 | 1.124 |
25k | 00:17:05 | 00:17:33 | 0.756 | 1.148 |
Check-in/Check-out without DI
Scenario | Load level | Request | Response time, sec | Response time, sec | ||
---|---|---|---|---|---|---|
95 perc | average | 95 perc | average | |||
Circulation Check-in/Check-out (without Data import) | 8 users | Check-in | 0.669 | 0.570 | 0.720 | 0.606 |
Check-out | 1.152 | 0.960 | 1.241 | 0.969 |
Comparison
Data Import durations and Check-In/Check-Out response time comparison
Data Import durations and CI/CO response time do not differ significantly. The number of requests during 2 hour CI/CO with Data Import test stayed the same 287669 in m5 and 287155 in m7g MSK instance type.
Job Profile | File size | DELTA, DI | DELTA, DI+CICO | DELTA, CI | DELTA, CO |
---|---|---|---|---|---|
PTF - Create 2 | 5k | 00:00:34 | 00:00:15 | 0.192 | 0.305 |
25k | 00:00:14 | 00:00:41 | 0.006 | 0.023 | |
PTF - Updates Success - 6 | 5k | 00:00:31 | 00:00:20 | 0.065 | 0.133 |
25k | 00:02:06 | 00:02:03 | 0.028 | 0.127 |
Kafka resource utilization comparison table
The m7g instance type consumes fewer resources in both idle and CICO+DI scenarios. CPU utilization is almost the same in the idle state but it is 3% to 18% lower under load during CI/CO + DI with the m7g instance type.
MSK instance: kafka.m5.2xlarge | MSK instance: kafka.m7g.2xlarge | Delta | % | ||
---|---|---|---|---|---|
Disk usage | Idle state | ||||
1 | 0.474 | 0.356 | -0.118 | ||
2 | 0.476 | 0.357 | -0.119 | ||
Under load | |||||
1 | 4.6120204 | 4.35 | -0.26202 | ||
2 | 4.611104 | 4.35 | -0.2611 | ||
CPU usage | Idle state | ||||
1 | 6.431249875 | 6.63958275 | 0.208333 | ||
2 | 6.1516666 | 6.227083625 | 0.075417 | ||
Under load | |||||
CICO | 1 | 13.7625025 | 10.6770835 | -3.08542 | -22.42% |
2 | 11.94791625 | 9.87916575 | -2.06875 | -17.31% | |
CICO+DI | 1 | 38.09166625 | 31.13749875 | -6.95417 | -18.26% |
2 | 33.82291125 | 32.53334625 | -1.28957 | -3.81% |
This table show comparison results of CICO without Data Import in two MSK clusters
Scenario | Load level | Request | Response time, sec | Response time, sec | Delta | ||
---|---|---|---|---|---|---|---|
MSK instance: kafka.m5.2xlarge | MSK instance: kafka.m7g.2xlarge | ||||||
95 perc | Average | 95 perc | Average | Average | |||
Circulation Check-in/Check-out (without Data import) | 8 users | Check-in | 0.669 | 0.57 | 0.72 | 0.606 | 0.036 |
Check-out | 1.152 | 0.96 | 1.241 | 0.969 | 0.009 |
Response time
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
Service CPU Utilization
Delta for CPU utilization shows in mod-di-converter-storage-b 20% decrease for update job and 10% decrease for mod-feesfines-b module. The most part of modules CPU utilization deltas fluctuate under 10%.
DI MARC BIB Create and Update + CICO
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
Service Memory Utilization
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
DB CPU Utilization
Average DB CPU usage for both MSK clusters during data import is 85% during create jobs and 87& during Update jobs. Check-In/Check-Out period without DI - 15%.
MSK instance: kafka.m5.2xlarge
MSK instance: kafka.m7g.2xlarge
DB Connections
Average connection count for both MSK clusters during data import is about 850 connections for create and update jobs with CI/CO. And 730 connections for CI/CO without data import
MSK instance: kafka.m5.2xlarge
DB load
MSK instance: kafka.m5.2xlarge
Top SQL-queries:
MSK instance: kafka.m7g.2xlarge
Top SQL-queries:
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
Appendix
Infrastructure
PTF -environment qcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs max_connections db.r6g.xlarge
32 GiB 4 vCPUs 2731 - MSK ptf-mobius-testing2
- 2 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
- MSK perf-921-g2
- 2 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
Task count for modules mod-oa-b, mod-graphql set to 0 before test start.
Modules
Methodology/Approach
- Populate ptf-mobius-testing2 cluster with topics from tenant cluster
- Run CICO for 2 hours
- After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
- Run alone Data Imports
- Create new kafka cluster
- Populate NEW cluster with topics from tenant cluster
- Run CICO for 2 hours
- After 10 min delay after start of CICO Run DI Create - Export - Update for 5 and 25k
- Run alone Data Imports
- Compare resource utilization of MSK and main KPI for CICO & DI