Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview IN PROGRESS

The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket 

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-855

Compared with results in previous test report: Data Import Create MARC holdings records [Poppy]

Summary

  • Data import jobs duration decreased significantly compared to the Orchid release results for all tests
  • Top CPU utilization: nginx-okapi - 52%, mod-quick-marc-b - 45%, mod-inventory-b - 42%,  mod-source-record-storage-b - 25%
  • Top Memory consumption: mod-users-b - 60%, mod-data-import-b - 60%, okapi-b - 48%. There're no issues with memory.
  • DI jobs duration is directly proportional to the size of the files being processed.

Recommendations & Jiras

  • Autovacuum as background process increase DI job duration for 10%-25% so it's necessary to check DB activity before DI jobs start
  • SQL queries are the possible place where performance may be improved:
    • SQL DB Load with WAITS - Lock:tuple, Lock:transactionid take a major time in queries: 
      INSERT INTO [tenant]_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb
      UPDATE [tenant]_mod_inventory_storage.holdings_record SET jsonb = $1::jsonb WHERE id = 'UUID'

Test Results

Profile used for testing - Default - Create Holdings and SRS MARC Holdings

Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings

...

Test

...

File

...

Duration: Orchid

(previous results)

...

Duration: Poppy

(previous results)

...

Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)

...

Test

...

File

...

Duration: Orchid

(previous results)

...

Duration: Poppy

(previous results)

...

Comparison

Table contains comparison between Quesnelia and Poppy

...

Test

...

File

...

Service CPU Utilization

...

titleCPU utilization, 1k, 5k, 10k, 80k

Set #1

...

Set #2

...

Table of Contents
Overview

The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket 

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-855

Compared with results in previous test report: Data Import Create MARC holdings records [Poppy]

Summary

  • Data import create holdings job durations increased significantly in Quesnelia release. 4 times longer with 10k file. Failed to complete with 80k file because it was stopped after 4 hours of test run with only 46 committed jobs (total for the test was 81).
  • Top CPU utilization: mod-inventory-b - 16%, nginx-okapi - 5%, mod-source-record-storage-b - 4%, mod-quick-marc-b - 7%. Such low resource utilization from modules side can be explained by DB queries huge average latency during INSERT and UPDATE processes which had lock on the same tuple.
  • Top memory consumption: mod-inventory-storage-b - 85%, mod-data-import-b - 52%, mod-source-record-storage-b - 45%, mod-source-record-manager-b - 43%. Growing trend was defined in tests set #1 for mod-inventory-storage-b - 85%
  • DI job duration for the same file size grew from test to test if to use the same instance HRID to create holdings
  • DI perform faster if to use files with 1 unique instance HRID for every 1000 records. DI duration corresponds to file size with such approach. Memory utilized without growing trend. And module CPU and RDS CPU utilization increased because higher load.

Recommendations & Jiras

  • Investigate growing trend for mod-inventory-storage in tests set #1 (using 1 instance HRID to create all Holdings)
  • Define high number of Holdings associated with one instance HRID that's still realistic
  • Consider limit the request /inventory/items-by-holdings-id with limit. Now limit=0. 
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODINVSTOR-1229

Errors

  • error status for 32'd split job during 80k file importing- SNAPSHOT_UPDATE_ERROR
  • Log message: 
    ERROR taImportKafkaHandler org.folio.inventory.dataimport.exceptions.CacheLoadingException: Error loading jobProfileSnapshot by id: 'aee287c2-0d40-4e8d-9879-4c1c54bcd819', status code: 503

Test Runs 

Profile used for testing - Default - Create Holdings and SRS MARC Holdings

Set of tests №ScenarioTest ConditionsStatus
1

DI Holdings Create (previous* approach)

1K, 5K, 10K, 80K sequentially 

1k, 5k, 10k - Completed

80k - Failed

2

DI Holdings Create (new** approach)

1K, 5K, 10K, 80K sequentially 

Completed

*previous approach - Data import Holdings with mrc file where 1 instance HRID is associated to all holdings (1k, 5k, 10k, 80k)

**new approach - Data import Holdings with mrc file where 1 instance HRID is associated to 1000 holdings 

Test Results

Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings

Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)

Test

File

Duration: Orchid

(previous results)

Duration: Poppy

(previous results)

Duration: Quesnelia [ECS] Set #1Status and Errors Quesnelia [ECS] Set #1Duration: Quesnelia [ECS] Set #2Status and Errors Quesnelia [ECS] Set #2
11k45s32s1 min 22 secSuccess1 min 3 secSuccess
25k7m 47s2m 14s8 minSuccess4 min 16 secSuccess
310k19m 46s4m 35s22 min 40 secSuccess8 min 59 secSuccess
480k20m (error*)36m 25s4 hours 13 min

Stopped by user after 46 job COMMITTED from 81 - 56% finished

1 job status - ERROR, with error status - SNAPSHOT_UPDATE_ERROR

(job number - 32, file_name = '1718290065265-80k_holdings_Create_32.mrc')

52 min 5 secSuccess

Previous test report: Data Import Create MARC holdings records [Poppy]

Service CPU Utilization

Expand
titleCPU utilization, 1k, 5k, 10k, 80k

Set #1

ModuleCPU (1k)CPU (5k)CPU (10k)CPU (80k)
mod-inventory-b
11
20.
93
23
28
24.
85
03
32
20.
97
5
33
16.
23
34
mod-
quick
di-
marc
converter-storage-b
7.198.18.197.64
mod-pubsub-b6.26.566.726.55
mod-users-b5.796.476.096.82
mod-configuration-b3.123.213.473.4
mod-feesfines-b2.392.472.552.34
mod-password-validator-b2.262.43.272.41
mod-di-converter-storage-b1.9510.229.398.98
mod-source-record-storage-b1.7911.4511.6410.84
mod-source-record-manager-b1.656.446.425.87
mod-data-import-b1.361.481.831.48
okapi-b1.2813.811414.93
mod-authtoken-b1.021.241.61.6
mod-circulation-storage-b0.70.710.710.73
nginx-okapi0.5920.9920.3922.89
mod-permissions-b0.484.951.651.63
mod-circulation-b0.350.360.350.36
mod-inventory-storage-b0.3314.2414.5413.96
pub-okapi0.170.240.240.23

Set #1: mod-inventory-b - 16%, nginx-okapi - 5%, mod-source-record-storage-b - 4%, mod-quick-marc-b - 7%

Set #2: mod-inventory-b - 33%, nginx-okapi - 23%, mod-source-record-storage-b - 11%, mod-quick-marc-b - 7%

Set #1

Image Removed

Set #2

Image Removed

Memory Utilization

...

titleMemory consumption

Set #1

...

Set #2

...

Set #1

Image Removed

Set #2

Image Removed

DB CPU Utilization

DB CPU was about 80% for all tests except of 1k - 23%.

DB Connections

DB connections number- 505

DB Load

SQL queries

Infrastructure

PTF - environment qcon

...

db.r6g.4xlarge

...

  • MSK ptf-mobius-testing2
    • 2 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0
    • EBS storage volume per broker 300 GiB
    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2
pub-okapi
Expand
titleModules
ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
qcon-pvt16/06/2024mod-inventory-bmod-quick-marc-bnginx-okapimod-di-converter-storage-bokapi-bmod-source-record-storage-bmod-source-record-manager-bmod-inventory-storage-bmod-pubsub-bmod-users-bmod-data-import-bmod-organizations-storage-bmod-notes-bmod-gobi-bmod-permissions-bmod-search-bmod-circulation-storage-bmod-circulation-b
12.313.516.213.94
nginx-okapi10.6713.0710.375.13
mod-inventory-storage-b9.9314.4912.8710.95
mod-quick-marc-b8.247.347.956.82
mod-source-record-storage-b7.998.426.473.79
mod-users-b7.165.955.936.58
okapi-b6.767.86.153.75
mod-pubsub-b5.695.735.785.69
mod-data-import-b5.151.271.61.23
mod-source-record-manager-b3.74.554.013.06
mod-authtoken-b3.691.261.460.99
mod-password-validator-b2.362.342.343.33
mod-feesfines-b2.342.122.22.16
mod-configuration-b2.222.081.982.71
mod-permissions-b1.660.81.110.94
mod-circulation-storage-b0.610.610.610.68
mod-circulation-b0.360.390.330.4
pub-okapi0.190.130.130.13

Set #2

ModuleCPU (1k)CPU (5k)CPU (10k)CPU (80k)
mod-inventory-b11.9328.8532.9733.23
mod-quick-marc-b7.198.18.197.64
mod-pubsub-b6.26.566.726.55
mod-users-b5.796.476.096.82
mod-configuration-b3.123.213.473.4
mod-feesfines-b2.392.472.552.34
mod-password-validator-b2.262.43.272.41
mod-di-converter-storage-b1.9510.229.398.98
mod-source-record-storage-b1.7911.4511.6410.84
mod-source-record-manager-b1.656.446.425.87
mod-data-import-b1.361.481.831.48
okapi-b1.2813.811414.93
mod-authtoken-b1.021.241.61.6
mod-circulation-storage-b0.70.710.710.73
nginx-okapi0.5920.9920.3922.89
mod-permissions-b0.484.951.651.63
mod-circulation-b0.350.360.350.36
mod-inventory-storage-b0.3314.2414.5413.96
pub-okapi0.170.240.240.23


Set #1: mod-inventory-b - 16%, nginx-okapi - 5%, mod-source-record-storage-b - 4%, mod-quick-marc-b - 7%

Set #1

Image Added

Set #2

Set #2: mod-inventory-b - 33%, nginx-okapi - 23%, mod-source-record-storage-b - 11%, mod-quick-marc-b - 7%

Image Added


Memory Utilization

Expand
titleMemory consumption

Set #1

ModuleMemory
mod-inventory-storage-b85.62
mod-data-import-b51.63
mod-source-record-storage-b44.97
mod-source-record-manager-b42.86
mod-users-b40.38
mod-inventory-b39.47
mod-permissions-b35.82
okapi-b33.4
mod-di-converter-storage-b33.26
mod-feesfines-b32.37
mod-quick-marc-b31.46
mod-configuration-b29.41
mod-pubsub-b25.66
mod-authtoken-b20.55
mod-circulation-storage-b18.93
mod-circulation-b17.87
nginx-okapi4.8
pub-okapi4.8

Set #2

ModuleMemory
mod-inventory-storage-b56.04
mod-data-import-b55.45
mod-inventory-b45.63
mod-source-record-manager-b41.19
mod-users-b38.95
mod-source-record-storage-b37.37
mod-quick-marc-b33.59
mod-permissions-b33.45
okapi-b32.82
mod-feesfines-b32.65
mod-di-converter-storage-b31.91
mod-configuration-b28.49
mod-circulation-storage-b26.86
mod-pubsub-b25.83
mod-circulation-b20.14
mod-authtoken-b19.97
nginx-okapi4.69
pub-okapi4.58



Set #1

Image Added

Set #2

Image Added

MSK tenant cluster

Disk usage by broker

Set #1

Image Added

Set #2

Image Added

CPU (User) usage by broker

Set #1

Image Added

Set #2

Image Added

RDS CPU Utilization

Set #1

62% for major part of the tests which is 20% less than in Poppy. It raised to 73% with 80k file after 50 minutes of test tun.

Image Added

Set #2

99% during all tests

Image Added

DB Connections

Set #1

DB connections number- 1430

Image Added

Set #2

DB connections number- 1500

Image Added

DB Load

Set #1

Image Added

Image Added

Set #2

Image Added

Image Added


SQL queries

Set #1

Image Added

Image Added

Set #2

Image Added

Image Added

Code Block
languagesql
themeFadeToGrey
titleTOP SQL Queries for Set #1
UPDATE cs00000int_0001_mod_inventory_storage.holdings_record SET jsonb = $1::jsonb WHERE id = '[UUID]'
INSERT INTO cs00000int_0001_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb
autovacuum: VACUUM cs00000int_mod_entities_links.authority
autovacuum: VACUUM cs00000int_mod_entities_links.authority_archive
autovacuum: VACUUM pg_toast.pg_toast_40004

  INSERT INTO cs00000int_mod_search.consortium_instance (tenant_id, instance_id, json, created_date, updated_date)
  VALUES ($1, $2, $3::json, $4, $5)
  ON CONFLICT (tenant_id, instance_id)
  DO UPDATE SET json = EXCLUDED.json, updated_date = EXCLUDED.updated_date

SELECT jsonb,id FROM cs00000int_0001_mod_inventory_storage.instance_holdings_item_view WHERE id='db87a6b4-d1f5-4e3d-b34b-d4bf06426127' LIMIT 1 OFFSET 0


Code Block
languagesql
themeFadeToGrey
titleTOP SQL Queries for Set #2
INSERT INTO cs00000int_0001_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb
UPDATE cs00000int_0001_mod_inventory_storage.holdings_record SET jsonb = $1::jsonb WHERE id = '47ee9b78-3d8f-4e8b-b09e-82e9396eb3b3'

with "cte" as (select count(*) from "records_lb" where ("records_lb"."snapshot_id" <> cast($1 as uuid) and "records_lb"."external_id" = cast($2 as uuid) and "records_lb"."record_type" = $3::"record_type")) select "records_lb"."id", "records_lb"."snapshot_id", "records_lb"."matched_id", "records_lb"."generation", "records_lb"."record_type", "records_lb"."external_id", "records_lb"."state", "records_lb"."leader_record_status", "records_lb"."order", "records_lb"."suppress_discovery", "records_lb"."created_by_user_id", "records_lb"."created_date", "records_lb"."updated_by_user_id", "records_lb"."updated_date", "records_lb"."external_hrid", "marc_records_lb"."content" as "parsed_record_content", "raw_records_lb"."content" as "raw_record_content", "error_records_lb"."content" as "error_record_content", "error_records_lb"."description", "count" from "records_lb" left outer join "marc_records_lb" on "records_lb"."id" = "marc_records_lb"."id" left outer join "raw_records_lb" on "records_lb"."id" = "raw_records_lb"."id" left outer join "error_records_lb" on "records_lb"."id" = "error_records_lb"."id" right outer join (select * from "cte") as "alias_80949780" on true where ("records_lb"."snapshot_id" <> cast($4 as uuid) and "records_lb"."external_id" = cast($5 as uuid) and "records_lb"."record_type" = $6::"record_type") offset $7 rows fetch next $8 rows only

INSERT INTO cs00000int_0001_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
INSERT INTO cs00000int_0001_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

autovacuum: VACUUM pg_toast.pg_toast_40004

Infrastructure

PTF - environment qcon

NameMemory GIBvCPUs
Engine version
db.r6g.xlarge
32 GB4 vCPUs
16.1
  • MSK tenant
    • 2 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0
    • EBS storage volume per broker 300 GiB
    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2


Expand
titleModules


ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSize
mod-inventory-b3mod-inventory:20.2.022880259210241814384512
mod-quick-marc-b1mod-quick-marc:5.1.02228821761281664384512
nginx-okapi1nginx-okapi:2023.06.1421024896128


mod-di-converter-storage-b2mod-di-converter-storage:2.2.22102489612876888128
okapi-b1okapi:5.3.03168414401024922384512
mod-source-record-storage-b1mod-source-record-storage:5.8.025600500020483500384512
mod-source-record-manager-b1mod-source-record-manager:3.8.025600500020483500384512
mod-inventory-storage-b2mod-inventory-storage:27.2.0-SNAPSHOT.73824096369020483076384512
mod-pubsub-b1mod-pubsub:2.13.02153614401024922384922
mod-users-b1mod-users:19.3.12102489612876888128
mod-data-import-b1mod-data-import:3.1.01204818442561292384512
mod-organizations-storage-b1mod-organizations-storage:4.7.02102489612870088128
mod-notes-b1mod-notes:5.2.021024896128952384512
mod-gobi-b1mod-gobi:2.8.02102489612870088128
mod-permissions-b2mod-permissions:6.5.02168415445121024384512
mod-search-b9mod-search:3.3.0-SNAPSHOT.224225922480204814405121024
mod-circulation-storage-b1mod-circulation-storage:17.2.022880259215361814384512
mod-circulation-b2mod-circulation:24.2.122880259215361814384512
pub-okapi1pub-okapi:2023.06.1421024896128768


Methodology/Approach

  1. Prepare Data Import Files 1k, 5k, 10k, 80k with defined number of holding records associated with instance HRID (1 instance HRID for all records or 1 per 1000 records)
    1. replace instance HRID field with active one from the environment (example: =004 colin00001144043)
    2. replace location field (example =852 01$bme3CC$hKFN5860.A6$iC732) where me3CC - the code of tenant location. Go to /settings/tenant-settings/location-locations and take the code of the location with active status
    3. to replace the field 004 - extract instance HRIDs of active instances for this tenant. Use sql query below
      1. Get total jobs durations

        Code Block
        languagesql
        themeFadeToGrey
        titleSQL to get job durations
        select file_name,total_records_in_file,started_date,completed_date, completed_date - started_date as duration ,status,error_status
        from [tenant]_mod_source_record_manager.job_execution
        where subordination_type = 'COMPOSITE_PARENT'
        -- where started_date > '2024-06-13 14:47:54' and completed_date < '2024-06-13 19:01:50.832' 
        order by started_date desc 
        limit 10
        


      2. Get instance HRID ids

        Code Block
        languagesql
        themeFadeToGrey
        titleSQL to get instance HRIDs
        select jsonb->>'hrid' as instanceHRID
        from [tenant]_mod_inventory_storage.instance
        where jsonb->>'discoverySuppress' = 'false' and jsonb->>'source' = 'MARC'
        limit 80


      3. Put instance HRID ids into stringsHRID.txt file without double quotes and headers. Every row should contain only HRID id
      4. Use PY script to replace HRID ids in mrc file if needed. Script is located in Git repository perf-testing\workflows-scripts\data-import\Holdings\Data_preparation_steps 
  2. Run Data Import sequentially one by one from the UI with 5 min delay (delay time can vary - this time defined as comfortable to get results).