Data Import test report Ramsons [non-ECS]
- 1 Overview
- 2 Summary
- 3 Results
- 3.1 Memory Utilization
- 3.2 CPU Utilization
- 3.3 RDS Metrics
- 3.3.1 DB CPU utilization for DI creates and updates with mod-search enabled
- 3.3.2 DB CPU utilisation for DI creates and updates with mod-search disabled
- 3.3.3 DB CPU utilization for DI creates and updates (combined)
- 3.3.4 DB load for DI creates and updates with mod-search enabled
- 3.3.5 DB load for DI creates and updates with mod-search disabled
- 3.3.6 DB load for DI creates and updates (combined)
- 3.4 Slow query found in mod-search
- 3.5 MSK CPU usage
- 4 Appendix
Overview
This document contains the results of testing Data Import for MARC Bibliographic records at Ramsons okapi release [non-ECS].
In scope is testing of 5K,10K,25K,50K, 100K data import create and update. Two round of tests - one mod-search was enabled and in the 2nd round mod-search was disable(task count =0).
Ticket: https://folio-org.atlassian.net/browse/PERF-966
Summary
All tests passed successfully.
With disabled mod-search all DI durations (for creates and updates) are the same in overal as in Q release.
With enabled mod-search there’s a big performance degradation for creates and updates (50-100%). It’s happening because of long running query from mod search side that causing significant load on DB and slowing overall process down. (this query is part of runtime reindexing process for contributor)
On mod-search schema side deadlocks observed during data import creates and updates, however deadlocks does not affect completion of DI.
Recommendations & Jiras
mod-search deadlocks tickethttps://folio-org.atlassian.net/browse/MSEARCH-932
mod-search slow query affecting DI performance https://folio-org.atlassian.net/browse/MSEARCH-972
Possibly related ticket for mod-search https://folio-org.atlassian.net/browse/MSEARCH-924
possible DB load improvements https://folio-org.atlassian.net/browse/MODDATAIMP-1177
Results
Test # | Data-import test | Profile | Duration Ramsons (rcp1) mod-search disabled | Duration Ramsons (rcp1) mod-search enabled | Duration Ramsons (rcon) | Duration Quesnelia (qcp1) | Duration Quesnelia (qcon) | Status |
|---|---|---|---|---|---|---|---|---|
1 | 5k MARC BIB Create | PTF - Create 2 | 1 min | 3 min 7 s | - | - | - | - |
| 10k MARC BIB Create | PTF - Create 2 | 4 min 30 s | 6 min 15 s | 5 min 10 s | 6 minutes | 4 min 14 sec | Completed |
2 | 25k MARC BIB Create | PTF - Create 2 | 11 min | 17 min | 10 min 30 s | 13 min 41 sec | 9 min 41 sec | Completed |
3 | 50k MARC BIB Create | PTF - Create 2 | 22 min | 41 min 25 s | 15 min 43 s | 21 min 59 sec | 18 min 18 sec | Completed |
4 | 100k MARC BIB Create | PTF - Create 2 | 46 min | 1 hr 19 min | 31 min 51 s | 40 min 16 sec | 38 min 36 sec | Completed |
| 5k MARC BIB Update | PTF - Updates Success - 6 | 3 min 33 s | 6 min 33 s | - | - | - | - |
6 | 10k MARC BIB Update | PTF - Updates Success - 6 | 6 min 46 s | 11 min 14 s | 7 min 10 s | 10 min 27 sec | 5 min 59 sec | Completed |
7 | 25k MARC BIB Update | PTF - Updates Success - 6 | 16 min 40 s | 28 min 43 s | 19 min 3 s | 23 min 16 sec | 19 min 52 sec | Completed |
8 | 50k MARC BIB Update | PTF - Updates Success - 6 | 33 min 45 s | 58 min 30 s | 38 min 53 sec | 40 min 52 sec | 37 min 53 sec | Completed |
9 | 100k MARC BIB Update | PTF - Updates Success - 6 | 1 hr 8 min | 2 hr 14 min | 1 hr 23 min | 1 hrs 2 min | 1 hrs 14 min | Completed |
Memory Utilization
Memory utiliяation showed stable trend during DI creates and updates tests. No sudden crashes or unexpected growth of memory usage were observed.
All services didn’t exceed 80% of memory usage. Most used module is mod-permissions and it memory growing up to 80% during tests, however after each test it returns to normal condition.
Service memory usage for DI creates and updates with mod-search enabled
Service memory usage for DI creates and updates with mod-search disabled
Service memory usage for DI creates and updates (combined)
CPU Utilization
CPU utilization is stable and predictable for all modules during all tests. Top service utilization modules are:
mod-inventory 10% max
mod-source-record-storage 7,5% max
mod-inventory-storage 5,5% max
Service CPU utilisation for DI creates and updates with enabled mod-search
Service CPU utilisation for DI creates and updates with disabled mod-search
During tests set with disabled mod-search (task count set to 0) CPU usage is higher for most of a modules. It happens as with disabled mod-search DB got more free resources and was able to process requests faster and it got reflection on CPU usage of a modules.
Service CPU utilization for DI creates and updates (combined)
On chart here is clearly visible higher CPU usage for modules with mod-search disabled. Moreover here is clearly visible that without mod-search DI is much faster.
RDS Metrics
As expected CPU usage of DB is high (as usual during data import process), however there’s visible improvement (lower) (Close to 100% with mod-search enabled and ±85% with disabled mod-search) CPU usage during DI with disabled mod-search.
DB CPU utilization for DI creates and updates with mod-search enabled
DB CPU utilisation for DI creates and updates with mod-search disabled
DB CPU utilization for DI creates and updates (combined)
DB load for DI creates and updates with mod-search enabled
DB load for DI creates and updates with mod-search disabled
DB load for DI creates and updates (combined)
On this chart is clearly visible impact of mod-search on overall DB load.
Slow query detected from mod-search side that affecting performance significantly:
Data volume:
search.instance - 4 109 321
search.instance_contributor - 8 327 231
Slow query found in mod-search
WITH cte AS (SELECT id,
name,
name_type_id,
authority_id,
last_updated_date
FROM fs09000000_mod_search.contributor
WHERE last_updated_date > $1
ORDER BY last_updated_date
)
SELECT c.id,
c.name,
c.name_type_id,
c.authority_id,
c.last_updated_date,
json_agg(
CASE
WHEN sub.instance_count IS NULL THEN NULL
ELSE json_build_object(
'count', sub.instance_count,
'typeId', sub.type_ids,
'shared', sub.shared,
'tenantId', sub.tenant_id
)
END
) AS instances
FROM cte c
LEFT JOIN
(SELECT cte.id,
ins.tenant_id,
ins.shared,
array_agg(DISTINCT ins.type_id) FILTER (WHERE ins.type_id <> '') AS type_ids,
count(DISTINCT ins.instance_id) AS instance_count
FROM fs09000000_mod_search.instance_contributor ins
INNER JOIN cte
ON ins.contributor_id = cte.id
GROUP BY cte.id,
ins.tenant_id,
ins.shared) sub ON c.id = sub.id
GROUP BY c.id,
c.name,
c.name_type_id,
c.authority_id,
c.last_updated_date
ORDER BY last_updated_date ASC
MSK CPU usage
During all tests CPU usage haven’t exceed 55% on all brokers.
Appendix
Infrastructure
PTF -environment rcp1
11 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
db.r6.xlarge database instances, writer
MSK fse-test
4 kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x (KRaft mode)
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3
OpenSearch 2.13 ptf-test cluster
r6g.2xlarge.search 4 data nodes
r6g.large.search 3 dedicated master nodes