Data Import BIB Sunflower [ECS]
Overview
This document contains the results of testing Data Import for MARC Bibliographic records creates and updates with different file sizes (10K, 25K, 50K, 100K, 500K) at Sunflower release [ECS].
Note: Starting from Sunflower all releases are Eureka.
Ticket: PERF-1117: [Sunflower] [ECS] [Data import] Update and Create MARC BIB RecordsClosed
Summary
All Data-imports jobs finished successfully without errors.
Duration of data imports for creates and updates are mostly the same as was in R release.
DI duration growth correlates to the number of records imported.
No memory leak is suspected for DI modules.
Services CPU utilization, Service memory utilization, and DB CPU utilization have the same utilization trend and values as in the R release.
mod-orders-storage query severely affecting performance during data import creates. Tested with enabled and disabled mod-orders-storage.
Follow-up task to investigate mod-orders-storage query. PERF-1139: Define relations between mod-orders-storage and data import BIB creates Open
Recommendations & Jiras
PERF-1139: Define relations between mod-orders-storage and data import BIB creates Open
Results
Test # | Data-import test | Profile | Duration Sunflower enabled orders | Duration Sunflower disabled orders-storage | Duration Ramsons (rcon) | Duration Quesnelia (qcon) | Duration Quesnelia (qcp1) | Results |
---|---|---|---|---|---|---|---|---|
1 | 5k MARC BIB Create | PTF - Create 2 | 2 min 32 s | 1 min 26 s | - | - | - | Completed |
2 | 10k MARC BIB Create | PTF - Create 2 | 4 min 46 s | 2 min 50 s | 5 min 10 s | 4 min 14 sec | 6 minutes | Completed |
3 | 25k MARC BIB Create | PTF - Create 2 | 11 min 3 s | 6 min 48 s | 10 min 30 s | 9 min 41 sec | 13 min 41 sec | Completed |
4 | 50k MARC BIB Create | PTF - Create 2 | 20 min 44 s | 13 min 37 s | 15 min 43 s | 18 min 18 sec | 21 min 59 sec | Completed |
5 | 100k MARC BIB Create | PTF - Create 2 | 46 min 24 s | 30 min 3 s | 31 min 51 s | 38 min 36 sec | 40 min 16 sec | Completed |
6 | 5k MARC BIB Update | PTF - Updates Success - 6 | - | 3 min 53 s | - | - | - | Completed |
7 | 10k MARC BIB Update | PTF - Updates Success - 6 | - | 7 min 16s | 7 min 10 s | 5 min 59 sec | 10 min 27 sec | Completed |
8 | 25k MARC BIB Update | PTF - Updates Success - 6 | - | 18 min 38 s | 19 min 3 s | 19 min 52 sec | 23 min 16 sec | Completed |
9 | 50k MARC BIB Update | PTF - Updates Success - 6 | - | 37 min 55 s | 38 min 53 sec | 37 min 53 sec | 40 min 52 sec | Completed |
10 | 100k MARC BIB Update | PTF - Updates Success - 6 | - | 1 hr 22 min | 1 hr 23 min | 1 hrs 14 min | 1 hrs 2 min | Completed |
Memory Utilization
Memory utilisation showing stable trend, without spikes and drops. No signs
CPU Utilization
RDS Metrics
mod-order-storage lock SQL
SELECT * FROM cs00000int_0001_mod_orders_storage.internal_lock WHERE lock_name = $1 FOR UPDATE
MSK service
Appendix
Infrastructure
PTF -environment secon
11 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
db.r6.xlarge database instances, writer
MSK fse-test
4 kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x (KRaft mode)
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
Cluster Resources - secon-pvt
S - Sunflower release
Dataset Size:
Record type | Number of records |
---|---|
Instances | 1 163 924 |
Holdings | 1 348 036 |
Items | 2 091 901 |
Methodology/Approach
Pre-generated files were used for DI Create job profile
10K, 25K, 50K, 100K and 500K files.
Run DI Create on a single tenant(cs00000int_0001) one by one with the delay with files using PTF - Create 2 profile.
Prepare files for DI Update with the Data export app, using previously imported items
Run DI Update on a single tenant(cs00000int_0001) one by one with the delay with prepared files using PTF - Update Success 2 profile
1K, 10K, 25K, 50K, 100K and 500K files.
Data-import durations were obtained from DB using SQL query
select file_name,started_date,completed_date, completed_date - started_date as duration ,status
from [tenant]_mod_source_record_manager.job_execution order by started_date desc limit 1000;