Comparing different values of KAFKA_CONSUMER_MAX_POLL_RECORDS

Comparing different values of KAFKA_CONSUMER_MAX_POLL_RECORDS

Overview

The PTF was asked to evaluate if decreasing the value of KAFKA_CONSUMER_MAX_POLL_RECORDS from 600 to 200 in mod-search’s task definition would have any performance impact on the workflows that makes use of this parameter. This parameter, as it name implies, specifies how many messages to get from each call to Kafka. Getting more records (600) will result in a bigger memory consumption but less trips to Kafka, whereas getting less records result in more trips but less memory consumption and perhaps more stability for mod-search, the main actor that uses this parameter.

Two workflows were chosen to be tested: Data Import (of 25K records) and Reindexing (of instances in a central tenant and 5 member tenants cluster) on a Sunflower release. Data Import jobs yield messages for records that need to be indexed, whereas reindexing records create messages that need to be (re)indexed. Data Import won’t be affected directly because the runtime indexing of the imported records is happening asynchronously whereas reindexing of the records is happening in real time and performance could be impacted directly by the parameter.

Summary

  • Data Import jobs had the same durations with the KAFKA_CONSUMER_MAX_POLL_RECORDS value being 600 or 200.

  • Reindexing’s durations also were similar to each other whether the value was 600 or 200.

  • All performance metrics: Service CPU and memory utilizations, DB CPU and AAS are similar between the two sets of tests as well for both Data Import and Reindexing workflows. No significant differences identified.

  • Conclusion: no performance impact on decreasing this value from 600 to 200.

Recommendations

  • KAFKA_CONSUMER_MAX_POLL_RECORDS can be set to 200 as necessary.

Test Results

The table below contains Data Import tests results of Create and Update imports of 25K MARC BIB records when the KAFKA_CONSUMER_MAX_POLL_RECORDS environment variable’s value equals 600 and 200. The durations of the Data Import jobs are listed in minutes and seconds. Evidently the Create and Update imports durations don’t vary much with either setting.

 

KAFKA_CONSUMER_MAX_POLL_RECORDS = 600

KAFKA_CONSUMER_MAX_POLL_RECORDS = 200

 

KAFKA_CONSUMER_MAX_POLL_RECORDS = 600

KAFKA_CONSUMER_MAX_POLL_RECORDS = 200

Create Import

0:12:38

0:13:13

Create Import

0:12:31

0:12:35

Create Import

0:12:41

0:11:44

Update Import

0:21:28

21:48

Update Import

0:20:22

21:51

Update Import

0:22:09

21:56

The next table shows reindexing test results with KAFKA_CONSUMER_MAX_POLL_RECORDS equals 600 and 200. Full reindexing was done on the central tenant that has over 1M instance records. Again, the durations – in minutes – are very similar between the reindexings. The table shows the duration data gathered from different components: mod-search, OpenSearch, and the database.

 

KAFKA_CONSUMER_MAX_POLL_RECORDS = 600

KAFKA_CONSUMER_MAX_POLL_RECORDS = 200

 

KAFKA_CONSUMER_MAX_POLL_RECORDS = 600

KAFKA_CONSUMER_MAX_POLL_RECORDS = 200

Reindexing (mod-search)

174

169

Indexing time (via indexing rate in OpenSearch)

120

120

Database

178

173

Metrics

Data Import

The next two graphs show service CPU and memory utilizations being similar for DI jobs with the KAFKA_CONSUMER_MAX_POLL_RECORDS having value = 600 or value = 200. Note: henceforth KAFKA_CONSUMER_MAX_POLL_RECORDS is denoted as “variable” in the following graphs for conciseness.

(Note that the service CPU and memory graphs show imports of 10K, not 25K records. This is because the 25K graphs are not available anymore, so 10K graphs are placed here instead. 25K import graphs and 10K graphs only differ in duration, not spike pattern or magnitude)

image-20250731-130753.png
image-20250731-205534.png

Database CPU utilizations for Create Imports: no changes between the imports with KAFKA_CONSUMER_MAX_POLL_RECORDS equals to 600 or 200. Note: henceforth KAFKA_CONSUMER_MAX_POLL_RECORDS is denoted as “variable” in the following graphs for conciseness.

image-20250723-095936.png

Database CPU utilizations for Update Imports: no changes between the imports with the variable’s value equals to 600 or 200

image-20250723-100425.png

OpenSearch’s indexing rates and patterns for the import tests are similar to each other when the variable is 600 or 200.

image-20250723-112721.png

ReIndexing

Service CPU and memory metrics for reindexing follow Data Import to exhibit the same pattern of spikes when the variable is 600 or 200.

image-20250731-100600.png
mod-search and mod-inventory-storage’s CPU utilizations all look similar between the two tests
image-20250731-101131.png
mod-search and mod-inventory-storage’s memory utilizations showing similar usages between the two tests

Database metrics such as CPU utilizations and CPU loads show the same durations and pattern of spikes during reindexing of the dataset with the variable = 600 or 200.

image-20250723-114916.png

 

image-20250723-115222.png

Indexing rates also show the same durations and pattern of spikes during reindexing of the dataset with the variable = 600 or 200.

image-20250723-115421.png

Appendix

Infrastructure

PTF -environment secon

  • 12 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • db.r7.xlarge database instances, writer

  • MSK fse-test

    • 4 kafka.m7g.xlarge brokers in 2 zones

    • Apache Kafka version 3.7.x (KRaft mode)

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

  • OpenSearch 2.13 ptf-test cluster (for Data Import tests)

    • r7g.2xlarge.search 4 data nodes

    • r6g.large.search 3 dedicated master nodes

  • OpenSearch 2.13 ptf-loc cluster (for reindexing tests)

    • r7g.xlarge.search 4 data nodes

    • m7g.large.search 3 dedicated master nodes

Cluster Resources - secon-pvt

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

mod-remote-storage

2

mod-remote-storage:3.4.2

2

4920

4472

128

3960

512

512

mod-remote-storage - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-finance-storage

2

mod-finance-storage:8.8.2

2

1024

896

128

700

88

128

mod-finance-storage - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-ncip

2

mod-ncip:1.15.7

2

1024

896

128

768

88

128

mod-ncip - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-agreements

2

mod-agreements:7.2.2

2

4096

4096

1024

4096

0

0

mod-agreements - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

256

256

0

96

mod-ebsconet

2

mod-ebsconet:2.4.0

2

1248

1024

128

700

128

256

mod-ebsconet - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-organizations

2

mod-organizations:2.1.0

2

1024

896

128

620

88

128

mod-organizations - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-serials-management

2

mod-serials-management:2.0.3

2

2780

2312

256

1792

384

896

mod-serials-management - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-settings

2

mod-settings:1.2.0

2

1024

896

200

768

88

128

mod-settings - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-data-import

2

mod-data-import:3.3.3

1

2048

1844

256

1292

384

512

mod-data-import - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-consortia-keycloak

2

mod-consortia-keycloak:1.7.1

2

5136

4776

512

4416

384

512

mod-consortia-keycloak - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128

256

0

96

mod-search

3

mod-search:5.0.3

2

2592

2480

1024

1440

512

1024

mod-search

4 (with KAFKA_CONSUMER_MAX_POLL_RECORDS = 200)

mod-search:5.0.3

2

2592

2480

1024

1440

512

1024

mod-search - Sidecar 1

N/A

folio-module-sidecar:3.0.4

N/A

1024

512

128