Data Import BIB Sunflower [ECS]

Data Import BIB Sunflower [ECS]

 

 

Overview

This document contains the results of testing Data Import for MARC Bibliographic records creates and updates with different file sizes (10K, 25K, 50K, 100K, 500K) at Sunflower release [ECS].

Note: Starting from Sunflower all releases are Eureka.
Ticket: PERF-1117: [Sunflower] [ECS] [Data import] Update and Create MARC BIB RecordsClosed

Summary

  • All Data-imports jobs finished successfully without errors.

  • Duration of data imports for creates and updates are mostly the same as was in R release.

  • DI duration growth correlates to the number of records imported. 

  • No memory leak is suspected for DI modules.

  • Services CPU utilization, Service memory utilization, and DB CPU utilization have the same utilization trend and values as in the R release.

  • mod-orders-storage query severely affecting performance during data import creates. Tested with enabled and disabled mod-orders-storage.

  • Follow-up task to investigate mod-orders-storage query. PERF-1139: Define relations between mod-orders-storage and data import BIB creates Open

Recommendations & Jiras

PERF-1139: Define relations between mod-orders-storage and data import BIB creates Open

 

Results

 

Test #

Data-import test

Profile

Duration

Sunflower

enabled orders

Duration

Sunflower

disabled orders-storage

Duration

Ramsons

(rcon)

Duration

Quesnelia (qcon)

Duration

Quesnelia (qcp1)

Results

Test #

Data-import test

Profile

Duration

Sunflower

enabled orders

Duration

Sunflower

disabled orders-storage

Duration

Ramsons

(rcon)

Duration

Quesnelia (qcon)

Duration

Quesnelia (qcp1)

Results

1

5k MARC BIB Create

PTF - Create 2

2 min 32 s

1 min 26 s

-

-

-

Completed

2

10k MARC BIB Create

PTF - Create 2

4 min 46 s

2 min 50 s

5 min 10 s

4 min 14 sec

6 minutes

Completed

3

25k MARC BIB Create

PTF - Create 2

11 min 3 s

6 min 48 s

10 min 30 s

9 min 41 sec

13 min 41 sec

Completed 

4

50k MARC BIB Create

PTF - Create 2

20 min 44 s

13 min 37 s

15 min 43 s

18 min 18 sec

21 min 59 sec

Completed 

5

100k MARC BIB Create

PTF - Create 2

46 min 24 s

30 min 3 s

31 min 51 s

38 min 36 sec

40 min 16 sec

Completed

6

5k MARC BIB Update

PTF - Updates Success - 6

-

3 min 53 s

-

-

-

Completed

7

10k MARC BIB Update

PTF - Updates Success - 6

-

7 min 16s

7 min 10 s

5 min 59 sec

10 min 27 sec

Completed

8

25k MARC BIB Update

PTF - Updates Success - 6

-

18 min 38 s

19 min 3 s

19 min 52 sec

23 min 16 sec

Completed

9

50k MARC BIB Update

PTF - Updates Success - 6

-

37 min 55 s

38 min 53 sec

37 min 53 sec

40 min 52 sec

Completed

10

100k MARC BIB Update

PTF - Updates Success - 6

-

1 hr 22 min

1 hr 23 min

1 hrs 14 min

1 hrs 2 min

Completed

 

Memory Utilization

Memory utilisation showing stable trend, without spikes and drops. No signs

 

image-20250505-094334.png

 

CPU Utilization 

image-20250505-090253.png

RDS Metrics 

image-20250505-094554.png

 

image-20250505-094901.png
image-20250505-095055.png

mod-order-storage lock SQL

SELECT * FROM cs00000int_0001_mod_orders_storage.internal_lock WHERE lock_name = $1 FOR UPDATE

 

MSK service

image-20250505-095406.png
image-20250505-095523.png

 

Appendix

Infrastructure

PTF -environment secon

  • 11 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • db.r6.xlarge database instances, writer

  • MSK fse-test

    • 4 kafka.m7g.xlarge brokers in 2 zones

    • Apache Kafka version 3.7.x (KRaft mode)

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

 

 

Cluster Resources - secon-pvt

 

S - Sunflower release

Dataset Size:

Record type

Number of records

Record type

Number of records

Instances

1 163 924

Holdings

1 348 036

Items

2 091 901

Methodology/Approach

  1. Pre-generated files were used for DI Create job profile

    • 10K, 25K, 50K, 100K and 500K files.

  2. Run DI Create on a single tenant(cs00000int_0001) one by one with the delay with files using PTF - Create 2 profile.

  3. Prepare files for DI Update with the Data export app, using previously imported items

  4. Run DI Update on a single tenant(cs00000int_0001) one by one with the delay with prepared files using PTF - Update Success 2 profile

  • 1K, 10K, 25K, 50K, 100K and 500K files.

  1. Data-import durations were obtained from DB using SQL query

select file_name,started_date,completed_date, completed_date - started_date as duration ,status from [tenant]_mod_source_record_manager.job_execution order by started_date desc limit 1000;