Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

Table of Contents
Overview

...

The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket 

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-855

...

  • Data import create holdings job durations increased significantly in Quesnelia release. 4 times longer with 10k file. And not defined increasing in 80k Failed to complete with 80k file because it was stopped after 4 hours of test run with only 46 committed jobs (total for the test was 81).
  • Top CPU utilization: mod-inventory-b - 16%, nginx-okapi - 5%, mod-source-record-storage-b - 4%, mod-quick-marc-b - 7%. Such low resource utilization from modules side can be explained by DB queries huge average latency during INSERT and UPDATE processes which had lock on the same tuple.
  • Top memory consumption: mod-inventory-storage-b - 85%, mod-data-import-b - 52%, mod-source-record-storage-b - 45%, mod-source-record-manager-b - 43%. Growing trend was defined in tests set #1 for mod-inventory-storage-b - 85%
  • DI job duration for the same file size grew from test to test if to use the same instance HRID to create holdings
  • DI perform faster if to use files with 1 unique instance HRID for every 1000 records. DI duration corresponds to file size with such approach. Memory utilized without growing trend. CPU and RDS utilization increased because there are less locks in DB.

Recommendations & Jiras

  • Investigate growing trend for mod-inventory-storage in tests set #1 (using 1 instance HRID to create all Holdings)
  • Define high number of Holdings associated with one instance HRID that's still realistic

Errors

  • error status for 32'd split job during 80k file importing- SNAPSHOT_UPDATE_ERROR

...

Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings

...

Test

Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)

Test

File

Duration: Orchid

(previous results)

Duration: Poppy

(previous results)

Duration: Quesnelia [ECS] Set #1Status and Errors Quesnelia [ECS] Set #1Duration: Quesnelia [ECS] Set #2Status and Errors Quesnelia [ECS] Set #2
11k45s32s1 min 22 secSuccess1 min 3 secSuccess
25k7m 47s2m 14s8 min
310k

Test

File

Duration: Orchid

(previous results)

Duration: Poppy

(previous results)

Duration: Quesnelia11k45s32s1 min 3 sec25k7m 47s2m 14s4 min 16 sec310k19m 46s4m 35s8 min 59 sec480k20m (error*)36m 25s52 min 5 sec
Success4 min 16 secSuccess
310k19m 46s4m 35s22 min 40 secSuccess8 min 59 secSuccess
480k20m (error*)36m 25s4 hours 13 min

Stopped by user after 46 job COMMITTED from 81 - 56% finished

1 job status - ERROR, with error status - SNAPSHOT_UPDATE_ERROR

(job number - 32, file_name = '1718290065265-80k_holdings_Create_32.mrc')

Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)

52 min 5 secSuccess

Compared with results in previous test report: Data Import Create MARC holdings records [Poppy]

Comparison

Table contains comparison between Quesnelia and Poppy

Set #1 (Compared Poppy and Quesnelia with different data set)

TestFileDuration: PoppyDuration: Quesnelia set #1Difference absoluteDifference percentage
11k00:00:3200:01:2200:00:50156%
25k00:02:1400:08:0000:05:46258%
310k00:04:3500:22:4000:18:05395%
480k00:36:2504:13:0003:36:35595%

...

Set #1: mod-inventory-b - 16%, nginx-okapi - 5%, mod-source-record-storage-b - 4%, mod-quick-marc-b - 7%

Set #1

Image Added

Set #2

Set #2: mod-inventory-b - 33%, nginx-okapi - 23%, mod-source-record-storage-b - 11%, mod-quick-marc-b - 7%

Set #1

Image Removed

Set #2


Memory Utilization

Expand
titleMemory consumption

Set #1

ModuleMemory
mod-inventory-storage-b85.62
mod-data-import-b51.63
mod-source-record-storage-b44.97
mod-source-record-manager-b42.86
mod-users-b40.38
mod-inventory-b39.47
mod-permissions-b35.82
okapi-b33.4
mod-di-converter-storage-b33.26
mod-feesfines-b32.37
mod-quick-marc-b31.46
mod-configuration-b29.41
mod-pubsub-b25.66
mod-authtoken-b20.55
mod-circulation-storage-b18.93
mod-circulation-b17.87
nginx-okapi4.8
pub-okapi4.8

Set #2

ModuleMemory
mod-inventory-storage-b56.04
mod-data-import-b55.45
mod-inventory-b45.63
mod-source-record-manager-b41.19
mod-users-b38.95
mod-source-record-storage-b37.37
mod-quick-marc-b33.59
mod-permissions-b33.45
okapi-b32.82
mod-feesfines-b32.65
mod-di-converter-storage-b31.91
mod-configuration-b28.49
mod-circulation-storage-b26.86
mod-pubsub-b25.83
mod-circulation-b20.14
mod-authtoken-b19.97
nginx-okapi4.69
pub-okapi4.58


...

NameMemory GIBvCPUs
Engine version
Architecture settings
db.r6g.
4xlarge128 GiB16 vCPUs

...

xlarge
32 GB4 vCPUs
16.1
Non-multitenant architecture
  • MSK tenant
    • 2 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0
    • EBS storage volume per broker 300 GiB
    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2

...

  1. Prepare Data Import Files 1k, 5k, 10k, 80k with defined number of holding records associated with instance HRID (1 instance HRID for all records or 1 per 1000 records)
    1. replace instance HRID field with active one from the environment (example: =004 colin00001144043)
    2. replace location field (example =852 01$bme3CC$hKFN5860.A6$iC732) where me3CC - the code of tenant location. Go to /settings/tenant-settings/location-locations and take the code of the location with active status
    3. to replace the field 004 - extract instance HRIDs of active instances for this tenant. Use sql query below
      1. Get total jobs durations

        Code Block
        languagesql
        themeFadeToGrey
        titleSQL to get job durations
        select file_name,total_records_in_file,started_date,completed_date, completed_date - started_date as duration ,status,error_status
        from [tenant]_mod_source_record_manager.job_execution
        where subordination_type = 'COMPOSITE_PARENT'
        -- where started_date > '2024-06-13 14:47:54' and completed_date < '2024-06-13 19:01:50.832' 
        order by started_date desc 
        limit 10
        


      2. Get instance HRID ids

        Code Block
        languagesql
        themeFadeToGrey
        titleSQL to get instance HRIDs
        select jsonb->>'hrid' as instanceHRID
        from [tenant]_mod_inventory_storage.instance
        where jsonb->>'discoverySuppress' = 'false' and jsonb->>'source' = 'MARC'
        limit 80


      3. Put instance HRID ids into stringsHRID.txt file without double quotes and headers. Every row should contain only HRID id
      4. Use PY script to replace HRID ids in mrc file if needed. Script is located in Git repository perf-testing\workflows-scripts\data-import\Holdings\Data_preparation_steps  View filenamePY.zipheight250
  2. Run Data Import sequentially one by one from the UI with 5 min delay (delay time can vary - this time defined as comfortable to get results).

...