Data Import MARC BIB Sunflower CSP2 [ECS]

Data Import MARC BIB Sunflower CSP2 [ECS]

 

 

Overview

This document contains the results of testing Data Import for MARC Bibliographic records with create and update profiles and different file sizes (10K, 25K, 50K, 100K) at Sunflower release [ECS] and Critical service path 2(CSP2).
Ticket: https://folio-org.atlassian.net/browse/PERF-1219

Summary

  • All Data-imports jobs finished successfully without errors.

  • The duration of data imports for creates and updates has a noticeable improvement compared to CSP1
    (More detailes are in the report https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/1200652289 )

    • CSP2 Updates are faster than CSP 1 likely due to a mod-search query that was running in CSP 1 (which created locking waits in the DB) not running in CSP 2.

    • CSP2’s Create imports MSK CPU utilizations are much lower compared to CSP 1 Create imports

  • DI duration growth correlates with the number of records imported. 

  • No memory leak is suspected for DI modules.

  • Services CPU utilization, Service memory utilization, and DB CPU utilization have the same utilization trend and values as in previous reports

Recommendations & Jiras

  1. Improve the performance of the SQL query that causes Lock on DB: https://folio-org.atlassian.net/browse/MODORDSTOR-491

  2. Before running regression testing, please discuss the possibility of changing the contribution and subject field in MRC files that are used for DI to avoid this issue:
    https://folio-org.atlassian.net/browse/MSEARCH-972
    https://folio-org.atlassian.net/browse/MODDATAIMP-1242

Results

 

Test #

Data-import test

Profile

Duration
Sunflower

CSP1
*

Duration
Sunflower CSP1
-LWT

*2

Duration

Sunflower CSP1 +LWT

*2

Duration

Sunflower
CSP2

1.1

5k MARC BIB Create

PTF - Create 2

3 min 3 s

3 min 42 s

4 min 07 s

2 min 21 s

1.2

10k MARC BIB Create

PTF - Create 2

5 min 32 s

7 min 08 s

7 min 16 s

4 min 9 s

1.3

25k MARC BIB Create

PTF - Create 2

12 min 25 s

16 min 29 s

17 min 30 s

10 min 29 s

1.4

50k MARC BIB Create

PTF - Create 2

26 min 20 s

36 min 4 s

35 min 54 s

19 min 35 s

1.5

100k MARC BIB Create

PTF - Create 2

54 min 8 s

1 hr 11 min

1 hr 14 min

38 min 4 s

2.1

5k MARC BIB Update

PTF - Updates Success - 6

2 min 44 s

2 min 31 s

3 min 41 s

2 min 27 s

2.2

10k MARC BIB Update

PTF - Updates Success - 6

5 min 33 s

4 min 11

4 min 58

4 min 20 s

2.3

25k MARC BIB Update

PTF - Updates Success - 6

12 min 51 s

9 min 49 s

12 min 33 s

10 min 48 s

2.4

50k MARC BIB Update

PTF - Updates Success - 6

25 min 32 s

19 min 27 s

24 min 16

22 min 30 s

2.5

100k MARC BIB Update

PTF - Updates Success - 6

54 min 41 s

40 min 30 s

53 min 11 s

46 min 41 s

Testing results from previous reports
*-https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/1200750594
*2-https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/1289814018

 

 

Resource utilization graphs

CPU Utilization

Test 1.1 - 1.5 (5k-10k-25k-50k-100k)

image-20251027-204709.png

Test 2.1 - 2.5 (5k-10k-25k-50k-100k)

image-20251027-204814.png

 

 

 

 

Memory Utilization 

Memory utilisation showing stable trend, without spikes and drops. No signs for memory leaks.

Test 1.1 - 1.5 (5k-10k-25k-50k-100k)

image-20251027-205830.png

Test 2.1 - 2.5 (5k-10k-25k-50k-100k)

image-20251027-205928.png

 

 

RDS Metrics 

Test 1.1 - 1.5 (5k-10k-25k-50k-100k)

image-20251027-210134.png

Test 2.1 - 2.5 (5k-10k-25k-50k-100k)

image-20251027-210159.png

 

DB Connections

Test 1.1 - 1.5 (5k-10k-25k-50k-100k)

image-20251027-210304.png


Test 2.1 - 2.5 (5k-10k-25k-50k-100k)

image-20251027-210327.png

Performance insight metrics


Test 1.1 - 1.5 (5k-10k-25k-50k-100k)
High resource utilization of SELECT * FROM cs00000int_0001_mod_orders_storage.internal_lock WHERE lock_name = $1 FOR UPDATE is described in addtional section

image-20251027-210522.png
image-20251027-210535.png

Test 2.1 - 2.5 (5k-10k-25k-50k-100k)


image-20251027-210716.png
image-20251027-210728.png

MSK service

CPU usage by broker
Test 1.1 - 1.5 (5k-10k-25k-50k-100k) - Test 2.1 - 2.5 (5k-10k-25k-50k-100k)

image-20251027-210932.png

Disk usage by broker
Test 1.1 - 1.5 (5k-10k-25k-50k-100k) - Test 2.1 - 2.5 (5k-10k-25k-50k-100k)

image-20251027-211117.png

 

Additional information

Analysis of High Lock:tuple Wait Events on internal_lock Table

 

Executive Summary

The high database load observed during the test, specifically characterized by the Lock:tuple wait event, is expected behavior and not a performance defect. This load is intentionally generated by the mod-orders-storage service to ensure data consistency when processing audit events.

The behavior is the direct result of a pessimistic locking mechanism (SELECT FOR UPDATE) implemented as per the user story, "Implement lock mechanism for Outbox Event Log."

 

Detailed Analysis

  1. Observed Symptom: The AWS Performance Insights dashboard shows a high number of Average Active Sessions (AAS) where the dominant wait event is Lock:tuple. This load is directly attributed to the following query: SELECT * FROM {tenant}_mod_orders_storage.internal_lock WHERE lock_name = ...

  2. Design and Business Rationale: This lock is a deliberate design choice to solve a potential data integrity problem:

    • Function: The mod-orders-storage service reads events from an outbox_events table and sends them to Kafka for auditing (the "Outbox Pattern").

    • Problem: The system runs multiple instances of this service in parallel. Without a lock, two or more instances could read the same batch of events simultaneously and send duplicate audit messages to Kafka.

    • Solution: A distributed lock was implemented using a single row in the internal_lock table.

  3. Mechanism of Action:

    • Before an instance of mod-orders-storage can read events, it must first execute a SELECT ... FOR UPDATE query on the internal_lock table.

    • The first instance to execute this query acquires an exclusive, row-level "tuple lock" on that row.

    • All other service instances that try to run the same query are forced to wait. This "waiting" state is what Performance Insights reports as the Lock:tuple wait event.

    • Once the first instance has finished processing its batch (reading events, sending to Kafka) and its transaction is complete, the lock is released. The database then grants the lock to the next waiting instance in the queue.

Conclusion 

The high Lock:tuple wait time is a direct and healthy sign that the locking mechanism is working as intended. It represents the time service instances spend "waiting in line" to process audit events sequentially.

This behavior is by design and essential for guaranteeing that audit events are sent to Kafka exactly once, preventing data duplication. It should be documented as an accepted system characteristic rather than a performance bottleneck to be fixed.

Feature: https://folio-org.atlassian.net/browse/UXPROD-3215

User Story: https://folio-org.atlassian.net/browse/MODORDSTOR-332  


Appendix

Infrastructure

PTF -environment secon

  • 8 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • db.r7g.xlarge database instances, writer

  • MSK fse-test

    • 4 kafka.m7g.xlarge brokers in 2 zones

    • Apache Kafka version 3.7.x (KRaft mode)

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

 

 

Cluster Resources - secon-pvt

Cluster Resources - secon-pvt (Mon Oct 27 12:14:50 UTC 2025)

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

mod-remote-storage

5

mod-remote-storage:3.4.3

2

4920

4472

0

3960

512

512

mod-remote-storage - Sidecar 1

N/A

folio-module-sidecar:3.0.13.nb

N/A

1024

512

0

256

0

96

mod-finance-storage

5

mod-finance-storage:8.8.4

2

1024

896

0

700

88

128

mod-finance-storage - Sidecar 1

N/A

folio-module-sidecar:3.0.13.nb

N/A

1024

512

0

256

0

96

mod-ncip

5

mod-ncip:1.16.1

2

1024

896

0

768

88

128

mod-ncip - Sidecar 1

N/A

folio-module-sidecar:3.0.13.nb

N/A

1024

512

0

256

0

96

mod-agreements

5

mod-agreements:7.2.2

2

4096

4096

0

4096

0

0

mod-agreements - Sidecar 1

N/A

folio-module-sidecar:3.0.13.nb

N/A

1024

512

0

256

0

96

mod-ebsconet

5

mod-ebsconet:2.4.0

2

1248

1024

0

700

128

256

mod-ebsconet - Sidecar 1

N/A

folio-module-sidecar:3.0.13.nb

N/A

1024

512

0

256

0

96

mod-organizations

5

mod-organizations:2.1.0

2

1024

896

0

768

88

128

mod-organizations - Sidecar 1

N/A

folio-module-sidecar:3.0.13.nb

N/A

1024

512

0

256

0

96

mod-serials-management

5

mod-serials-management:2.0.3

2