Data Import MARC BIB Sunflower [non-ECS]

Data Import MARC BIB Sunflower [non-ECS]

Overview

This document contains the results of testing Data Import for MARC Bibliographic records at Sunflower release [non-ECS].

Note: Starting from Sunflower release all FOLIO environments will be Eureka.

In scope is testing of 5K,10K,25K,50K, 100K data import create and update.
Ticket: https://folio-org.atlassian.net/browse/PERF-1108

 

Summary

  • All tests passed successfully.

  • Visible significant improvements on mod-search side (in Ramsons long running query caused performance degradation 50-100%). Now duration of data imports is faster for creates and more or les the same for updates.

  • On mod-search schema side deadlocks observed during data import creates and updates, however deadlocks does not affect completion of DI.

  • Results of Data Import MARC BIB Sunflower [non-ECS] - CSP1 - https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/1200652289

Recommendations & Jiras

Results

Test #

Data-import test

Profile

Duration

Sunflower

(secp1)

Duration

Ramsons (rcp1)

mod-search enabled

Duration

Quesnelia (qcp1)

Status

Test #

Data-import test

Profile

Duration

Sunflower

(secp1)

Duration

Ramsons (rcp1)

mod-search enabled

Duration

Quesnelia (qcp1)

Status

1

5k MARC BIB Create

PTF - Create 2

2 min 4 s

3 min 7 s

-

Completed

 

10k MARC BIB Create

PTF - Create 2

4 min 43 s

6 min 15 s

6 minutes

Completed

2

25k MARC BIB Create

PTF - Create 2

10 min

17 min

13 min 41 sec

Completed 

3

50k MARC BIB Create

PTF - Create 2

21 min

41 min 25 s

21 min 59 sec

Completed 

4

100k MARC BIB Create

PTF - Create 2

42 min 46 s

1 hr 19 min

40 min 16 sec

Completed

 

5k MARC BIB Update

PTF - Updates Success - 6

6 min 18 s

6 min 33 s

-

Completed

6

10k MARC BIB Update

PTF - Updates Success - 6

6 min 4 s

11 min 14 s

10 min 27 sec

Completed

7

25k MARC BIB Update

PTF - Updates Success - 6

31 min

28 min 43 s

23 min 16 sec

Completed

8

50k MARC BIB Update

PTF - Updates Success - 6

1 hr 8 min

58 min 30 s

40 min 52 sec

Completed

9

100k MARC BIB Update

PTF - Updates Success - 6

2 hr 5 min

2 hr 14 min

1 hrs 2 min

Completed

Memory Utilization

Memory utilization showed stable trend during DI creates and updates tests. No sudden crashes or unexpected growth of memory usage were observed.

All services didn’t exceed 80% of memory usage. Most used module is mod-permissions, however after each test it returns to normal condition.

Service memory usage for DI creates and updates

image-20250424-115029.png

 

 

CPU Utilization

CPU utilization is stable and predictable for all modules during all tests.

Service CPU utilisation for DI creates and updates

image-20250424-113651.png

RDS Metrics 

As expected CPU usage of DB is high (as usual during data import process).

DB CPU utilization for DI creates and updates

image-20250425-121935.png
image-20250425-125040.png

 

  

DB load for DI creates and updates

image-20250425-124251.png

 

image-20250425-124414.png

Slow query detected from mod-search side that affecting performance significantly and it’s still a slowest query:

Data volume:

search.instance - 4 109 321

search.instance_contributor - 8 327 231

Slow query found in mod-search

WITH cte AS (SELECT id, name, name_type_id, authority_id, last_updated_date FROM fs09000000_mod_search.contributor WHERE last_updated_date > $1 ORDER BY last_updated_date ) SELECT c.id, c.name, c.name_type_id, c.authority_id, c.last_updated_date, json_agg( CASE WHEN sub.instance_count IS NULL THEN NULL ELSE json_build_object( 'count', sub.instance_count, 'typeId', sub.type_ids, 'shared', sub.shared, 'tenantId', sub.tenant_id ) END ) AS instances FROM cte c LEFT JOIN (SELECT cte.id, ins.tenant_id, ins.shared, array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') AS type_ids, count(DISTINCT ins.instance_id) AS instance_count FROM fs09000000_mod_search.instance_contributor ins INNER JOIN cte ON ins.contributor_id = cte.id GROUP BY cte.id, ins.tenant_id, ins.shared) sub ON c.id = sub.id GROUP BY c.id, c.name, c.name_type_id, c.authority_id, c.last_updated_date ORDER BY last_updated_date ASC

 

 

MSK CPU usage

During all tests CPU usage haven’t exceed 60% on all brokers.  

image-20250425-124624.png

Appendix

Infrastructure

PTF -environment rcp1

  • 11 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • db.r6.xlarge database instances, writer

  • MSK fse-test

    • 4 kafka.m7g.xlarge brokers in 2 zones

    • Apache Kafka version 3.7.x (KRaft mode)

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

  • OpenSearch 2.13 ptf-test cluster

    • r6g.2xlarge.search 4 data nodes

    • r6g.large.search 3 dedicated master nodes

Cluster Resources - rcp1-pvt

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

mod-remote-storage

1

mod-remote-storage:3.4.1

2

4920

4472

128

3960

512

512

mod-remote-storage - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

mod-finance-storage

1

mod-finance-storage:8.8.2

2

1024

896

128

700

88

128

mod-finance-storage - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

mod-ebsconet

1

mod-ebsconet:2.4.0

2

1248

1024

128

700

128

256

mod-ebsconet - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

edge-sip2

1

edge-sip2:3.4.0

2

1024

896

128

768

88

128

mod-consortia-keycloak

1

mod-consortia-keycloak:1.7.1

2

5136

4776

512

4416

384

512

mod-consortia-keycloak - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

mod-tags

1

mod-tags:2.4.0

2

1024

896

128

768

88

128

mod-tags - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

edge-courses

1

edge-courses:1.6.0

2

1024

896

128

768

88

128

mod-inventory-update

1

mod-inventory-update:4.1.0

2

1024

896

128

768

88

128

mod-inventory-update - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

mod-notify

1

mod-notify:3.4.0

2

1024

896

128

768

88

128

mod-notify - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256

0

96

mod-configuration

1

mod-configuration:5.12.0

2

1024

896

128

768

88

128

mod-configuration - Sidecar 1

N/A

folio-module-sidecar:3.0.1.410

N/A

1024

512

128

256