Data Import Create MARC Holdings Records [non-ECS] [Quesnelia]

Overview

The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket  PERF-833 - Getting issue details... STATUS

Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]

Summary

  • Data import create holdings jobs perform faster in Quesnelia release about 40% in Average comparing with Poppy.
  • Number of associated holdings to one instance in files used in tests do not affect duration of data import in non-ECS environments.
  • Top CPU utilization: mod-inventory-b - 73%, nginx-okapi - 65%, mod-quick-marc-b - 57%, mod-source-record-storage-b - 35%
  • Top Memory consumption: 
    • Set #1: mod-inventory-storage-b - 87%, mod-inventory-b - 72%, mod-data-import-b - 59%, Spikes for mod-inventory-storage-b were observed and after tests finished it returned into "before tests" state.
    • Set #2: mod-inventory-storage-b - 24%, mod-inventory-b - 56%, mod-data-import-b - 58%, mod-users-b - 53%. During 
  • RDS CPU utilization was on level 95% for all DI tests except of test with 1k file.
  • RDS DB connections were 860

Recommendations & Jiras

  • Investigate memory growing trend for mod-inventory-storage in tests set #1 (using 1 instance HRID to create all Holdings).

Test Runs 

Profile used for testing - Default - Create Holdings and SRS MARC Holdings

Set of tests №ScenarioTest Conditions
1

DI Holdings Create (previous* approach)

1K, 5K, 10K, 80K sequentially 
2

DI Holdings Create (new** approach)

1K, 5K, 10K, 80K sequentially 

*previous approach - Data import Holdings with mrc file where 1 instance HRID is associated to all holdings (1k, 5k, 10k, 80k)

**new approach - Data import Holdings with mrc file where 1 instance HRID is associated to 1000 holdings 

Test Results

Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings

Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)

Test

Number of records in File

Duration: Quesnelia [non-ECS] Set #1Duration: Quesnelia [non-ECS] Set #2Status and Errors Quesnelia [non-ECS] Set #1, Set #2
11k19 sec25 secSuccess
25k1 min 17 sec1 min 24 secSuccess
310k2 min 32 sec2 min 40 secSuccess
480k19 min 54 sec21 min 44 secSuccess

Comparison

Test

Number of records in File

Duration: Poppy [non-ECS]Duration: Quesnelia [non-ECS] Set #1Delta, sec%, Delta/Poppy Duration
11k32 sec19 sec13 sec40.63%
25k2 min 14 sec1 min 17 sec57 sec42.54%
310k4 min 35 sec2 min 32 sec2 min 3 sec44.73%
480k36 min 25 sec19 min 54 sec16 min 31 sec45.35%

Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]

Service CPU Utilization

 CPU utilization, 1k, 5k, 10k, 80k

Set #1

ModuleCPU (1k)CPU (5k)CPU (10k)CPU (80k)
mod-inventory-b18.9171.0872.1472.96
nginx-okapi12.2455.458.9564.91
mod-quick-marc-b10.5959.8756.4256.6
mod-source-record-storage-b8.5926.4628.8234.64
mod-permissions-b6.872.692.712.91
okapi-b6.5539.7739.7742.23
mod-users-b6.396.337.816.58
mod-pubsub-b6.066.556.586.6
mod-authtoken-b5.386.594.55.74
mod-inventory-storage-b5.2325.2126.531.8
mod-di-converter-storage-b5.0317.6418.0122.73
mod-source-record-manager-b4.1419.1821.425.49
mod-data-import-b3.784.913.361.46
mod-password-validator-b2.242.372.612.36
mod-configuration-b2.142.142.652.27
mod-feesfines-b1.982.142.132.14
mod-circulation-storage-b0.560.580.780.6
mod-circulation-b0.330.340.360.35
pub-okapi0.160.10.160.13

Set #2

ModuleCPU (1k)CPU (5k)CPU (10k)CPU (80k)
mod-quick-marc-b37.33157.3556.3651.23
mod-inventory-b28.3382.4885.2766.23
okapi-b7.8436.8140.0339.22
mod-pubsub-b6.086.546.616.55
mod-source-record-manager-b5.9223.421.9323.38
mod-users-b5.856.196.426.29
nginx-okapi5.3147.8558.8761.95
mod-source-record-storage-b4.929.4332.3529.6
mod-di-converter-storage-b4.5326.6134.9622.05
mod-inventory-storage-b3.3829.4230.1832.14
mod-configuration-b2.22.192.42.28
mod-password-validator-b2.152.22.62.46
mod-feesfines-b1.984.032.242.14
mod-permissions-b1.972.754.272.68
mod-data-import-b1.242.771.921.45
mod-authtoken-b1.124.934.186.69
mod-circulation-storage-b0.580.60.730.62
mod-circulation-b0.470.350.420.33
pub-okapi0.120.120.120.11





Memory Utilization

 Memory consumption

Set #1

ModuleMemory
mod-inventory-storage-b87.22
mod-inventory-b71.95
mod-data-import-b59.06
mod-permissions-b56.06
mod-source-record-manager-b54.96
mod-users-b52.87
okapi-b50.51
mod-quick-marc-b47.25
mod-di-converter-storage-b46.32
mod-configuration-b41.39
mod-feesfines-b39.85
mod-pubsub-b34.04
mod-source-record-storage-b32.92
mod-authtoken-b26.48
mod-circulation-storage-b21.33
mod-circulation-b16.13
nginx-okapi4.85
pub-okapi4.69

Set #2

ModuleMemory
mod-data-import-b57.93
mod-inventory-b56.19
mod-permissions-b55.98
mod-users-b52.79
okapi-b50.56
mod-source-record-manager-b49.19
mod-quick-marc-b47.33
mod-di-converter-storage-b45.76
mod-configuration-b41.56
mod-feesfines-b39.88
mod-source-record-storage-b38.79
mod-pubsub-b33.96
mod-authtoken-b26.21
mod-inventory-storage-b23.8
mod-circulation-storage-b21.24
mod-circulation-b16.05
nginx-okapi4.8
pub-okapi4.69



MSK tenant cluster

Disk usage by broker




CPU (User) usage by broker



Open Search

CPU utilization master node

CPU utilization data node

Minimum free storage space (GiB)

Maximuim memory utilization data node

Indexing rate

RDS CPU Utilization

For all tests - 95% except of DI Holdings with 1k file - 25%.


DB Connections



DB Load

Set #1


Set #2

SQL queries


TOP SQL Queries for Set #1
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *
INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)

WITH input_rows(record_id, holdings_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, holdings_id::uuid
   )
SELECT record_id, holdings_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.holdings_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_holdings c USING (record_id);

TOP SQL Queries for Set #2
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *

INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb

WITH input_rows(record_id, holdings_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, holdings_id::uuid
   )
SELECT record_id, holdings_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.holdings_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_holdings c USING (record_id);

Infrastructure

PTF - environment qcp1

NameMemory GIBvCPUs
Engine version
db.r6g.xlarge
32 GB4 vCPUs
16.1
  • MSK tenant
    • 2 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0
    • EBS storage volume per broker 300 GiB
    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2
  • Open Search ptf-test
    • version OpenSearch_2_7_R20240502 
    • Data nodes
      • Instance type - r6g.2xlarge.search
      • Number of nodes - 4
      • Storage type - EBS
      • EBS volume size (GiB) - 500
    • Dedicated master nodes
      • Instance type - r6g.large.search
      • Number of nodes - 3
  • DB records 
    •  fs09000000
      • Instances - 25901331
      • Items - 27074913
      • Holdings - 25871735
    • fs07000001
      • Instances - 10100620
      • Items - 1484850
      • Holdings - 10522266
    • fs07000002
      • Instances - 1161275
      • Items - 1153548
      • Holdings - 1153548
 Modules
Module VersionRevisionTask CountMem Hard LimitMem Soft LimitCPUXmxMetaspaceSizeMaxMetaspaceSize
mod-users-bl:7.7.0521440115251292288128
mod-configuration:5.10.052102489612876888128
mod-authtoken:2.15.1621440115251292288128
mod-data-import:3.1.081204818442561292384512
mod-remote-storage:3.2.0524920447210243960512512
mod-inventory-storage:27.1.0524096369020483076384512
pub-okapi:2023.06.14321024896128768--
mod-feesfines:19.1.052102489612876888128
okapi:5.3.053168414401024922384512
nginx-okapi:2023.06.14321024896128---
mod-quick-marc:5.1.051228821761281664384512
mod-source-record-manager:3.9.0-SNAPSHOT.330625600500020483500384512
mod-patron-blocks:1.10.0521024896102476888128
mod-pubsub:2.13.052153614401024922384512
mod-circulation:24.2.0522880259215361814384512
mod-di-converter-storage:2.2.052102489612876888128
mod-inventory:20.2.0522880259210241814384512
mod-source-record-storage:5.8.0525600500020483500384512
mod-circulation-storage:17.2.0522880259215361814384512
mod-organizations-storage:4.7.052102489612876888128
mod-notes:5.2.0521024896128952384512
mod-gobi:2.8.052102489612876888128
mod-permissions:6.5.0102168415445121024384512
mod-search:3.2.05225922480204814405121024

Methodology/Approach

  1. Prepare Data Import Files 1k, 5k, 10k, 80k with defined number of holding records associated with instance HRID (1 instance HRID for all records or 1 per 1000 records)
    1. replace instance HRID field with active one from the environment (example: =004 colin00001144043)
    2. replace location field (example =852 01$bme3CC$hKFN5860.A6$iC732) where me3CC - the code of tenant location. Go to /settings/tenant-settings/location-locations and take the code of the location with active status
    3. to replace the field 004 - extract instance HRIDs of active instances for this tenant. Use sql query below
      1. Get total jobs durations

        SQL to get job durations
        select file_name,total_records_in_file,started_date,completed_date, completed_date - started_date as duration ,status,error_status
        from [tenant]_mod_source_record_manager.job_execution
        where subordination_type = 'COMPOSITE_PARENT'
        -- where started_date > '2024-06-13 14:47:54' and completed_date < '2024-06-13 19:01:50.832' 
        order by started_date desc 
        limit 10
        
      2. Get instance HRID ids

        SQL to get instance HRIDs
        select jsonb->>'hrid' as instanceHRID
        from [tenant]_mod_inventory_storage.instance
        where jsonb->>'discoverySuppress' = 'false' and jsonb->>'source' = 'MARC'
        limit 80
      3. Put instance HRID ids into stringsHRID.txt file without double quotes and headers. Every row should contain only HRID id
      4. Use PY script to replace HRID ids in mrc file if needed. Script is located in Git repository perf-testing\workflows-scripts\data-import\Holdings\Data_preparation_steps 
  2. Run Data Import sequentially one by one from the UI with 5 min delay (delay time can vary - this time defined as comfortable to get results).