Data Import MARC BIB Sunflower [non-ECS]
Overview
This document contains the results of testing Data Import for MARC Bibliographic records at Sunflower release [non-ECS].
Note: Starting from Sunflower release all FOLIO environments will be Eureka.
In scope is testing of 5K,10K,25K,50K, 100K data import create and update.
Ticket: https://folio-org.atlassian.net/browse/PERF-1108
Summary
All tests passed successfully.
Visible significant improvements on mod-search side (in Ramsons long running query caused performance degradation 50-100%). Now duration of data imports is faster for creates and more or les the same for updates.
On mod-search schema side deadlocks observed during data import creates and updates, however deadlocks does not affect completion of DI.
Results of Data Import MARC BIB Sunflower [non-ECS] - CSP1 - https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/1200652289
Recommendations & Jiras
mod-search deadlocks tickethttps://folio-org.atlassian.net/browse/MSEARCH-932
Results
Test # | Data-import test | Profile | Duration Sunflower (secp1) | Duration Ramsons (rcp1) mod-search enabled | Duration Quesnelia (qcp1) | Status |
|---|---|---|---|---|---|---|
1 | 5k MARC BIB Create | PTF - Create 2 | 2 min 4 s | 3 min 7 s | - | Completed |
| 10k MARC BIB Create | PTF - Create 2 | 4 min 43 s | 6 min 15 s | 6 minutes | Completed |
2 | 25k MARC BIB Create | PTF - Create 2 | 10 min | 17 min | 13 min 41 sec | Completed |
3 | 50k MARC BIB Create | PTF - Create 2 | 21 min | 41 min 25 s | 21 min 59 sec | Completed |
4 | 100k MARC BIB Create | PTF - Create 2 | 42 min 46 s | 1 hr 19 min | 40 min 16 sec | Completed |
| 5k MARC BIB Update | PTF - Updates Success - 6 | 6 min 18 s | 6 min 33 s | - | Completed |
6 | 10k MARC BIB Update | PTF - Updates Success - 6 | 6 min 4 s | 11 min 14 s | 10 min 27 sec | Completed |
7 | 25k MARC BIB Update | PTF - Updates Success - 6 | 31 min | 28 min 43 s | 23 min 16 sec | Completed |
8 | 50k MARC BIB Update | PTF - Updates Success - 6 | 1 hr 8 min | 58 min 30 s | 40 min 52 sec | Completed |
9 | 100k MARC BIB Update | PTF - Updates Success - 6 | 2 hr 5 min | 2 hr 14 min | 1 hrs 2 min | Completed |
Memory Utilization
Memory utilization showed stable trend during DI creates and updates tests. No sudden crashes or unexpected growth of memory usage were observed.
All services didn’t exceed 80% of memory usage. Most used module is mod-permissions, however after each test it returns to normal condition.
Service memory usage for DI creates and updates
CPU Utilization
CPU utilization is stable and predictable for all modules during all tests.
Service CPU utilisation for DI creates and updates
RDS Metrics
As expected CPU usage of DB is high (as usual during data import process).
DB CPU utilization for DI creates and updates
DB load for DI creates and updates
Slow query detected from mod-search side that affecting performance significantly and it’s still a slowest query:
Data volume:
search.instance - 4 109 321
search.instance_contributor - 8 327 231
Slow query found in mod-search
WITH cte AS (SELECT id,
name,
name_type_id,
authority_id,
last_updated_date
FROM fs09000000_mod_search.contributor
WHERE last_updated_date > $1
ORDER BY last_updated_date
)
SELECT c.id,
c.name,
c.name_type_id,
c.authority_id,
c.last_updated_date,
json_agg(
CASE
WHEN sub.instance_count IS NULL THEN NULL
ELSE json_build_object(
'count', sub.instance_count,
'typeId', sub.type_ids,
'shared', sub.shared,
'tenantId', sub.tenant_id
)
END
) AS instances
FROM cte c
LEFT JOIN
(SELECT cte.id,
ins.tenant_id,
ins.shared,
array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') AS type_ids,
count(DISTINCT ins.instance_id) AS instance_count
FROM fs09000000_mod_search.instance_contributor ins
INNER JOIN cte
ON ins.contributor_id = cte.id
GROUP BY cte.id,
ins.tenant_id,
ins.shared) sub ON c.id = sub.id
GROUP BY c.id,
c.name,
c.name_type_id,
c.authority_id,
c.last_updated_date
ORDER BY last_updated_date ASC
MSK CPU usage
During all tests CPU usage haven’t exceed 60% on all brokers.
Appendix
Infrastructure
PTF -environment rcp1
11 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
db.r6.xlarge database instances, writer
MSK fse-test
4 kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x (KRaft mode)
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
OpenSearch 2.13 ptf-test cluster
r6g.2xlarge.search 4 data nodes
r6g.large.search 3 dedicated master nodes