Overview

The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket PERF-833 - Getting issue details... STATUS

Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]

Summary

Data import create holdings jobs perform faster in Quesnelia release about 40% in Average comparing with Poppy.
Number of associated holdings to one instance in files used in tests do not affect duration of data import in non-ECS environments.
Top CPU utilization: mod-inventory-b - 73%, nginx-okapi - 65%, mod-quick-marc-b - 57%, mod-source-record-storage-b - 35%
Top Memory consumption:
- Set #1: mod-inventory-storage-b - 87%, mod-inventory-b - 72%, mod-data-import-b - 59%, Spikes for mod-inventory-storage-b were observed and after tests finished it returned into "before tests" state.
- Set #2: mod-inventory-storage-b - 24%, mod-inventory-b - 56%, mod-data-import-b - 58%, mod-users-b - 53%. During
RDS CPU utilization was on level 95% for all DI tests except of test with 1k file.
RDS DB connections were 860

Recommendations & Jiras

Investigate memory growing trend for mod-inventory-storage in tests set #1 (using 1 instance HRID to create all Holdings).

Test Runs

Profile used for testing - Default - Create Holdings and SRS MARC Holdings

Set of tests №	Scenario	Test Conditions
1	DI Holdings Create (previous* approach)	1K, 5K, 10K, 80K sequentially
2	DI Holdings Create (new** approach)	1K, 5K, 10K, 80K sequentially

*previous approach - Data import Holdings with mrc file where 1 instance HRID is associated to all holdings (1k, 5k, 10k, 80k)

**new approach - Data import Holdings with mrc file where 1 instance HRID is associated to 1000 holdings

Test Results

Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings

Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)

Test	Number of records in File	Duration: Quesnelia [non-ECS] Set #1	Duration: Quesnelia [non-ECS] Set #2	Status and Errors Quesnelia [non-ECS] Set #1, Set #2
1	1k	19 sec	25 sec	Success
2	5k	1 min 17 sec	1 min 24 sec	Success
3	10k	2 min 32 sec	2 min 40 sec	Success
4	80k	19 min 54 sec	21 min 44 sec	Success

Comparison

Test	Number of records in File	Duration: Poppy [non-ECS]	Duration: Quesnelia [non-ECS] Set #1	Delta, sec	%, Delta/Poppy Duration
1	1k	32 sec	19 sec	13 sec	40.63%
2	5k	2 min 14 sec	1 min 17 sec	57 sec	42.54%
3	10k	4 min 35 sec	2 min 32 sec	2 min 3 sec	44.73%
4	80k	36 min 25 sec	19 min 54 sec	16 min 31 sec	45.35%

Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]

Service CPU Utilization

CPU utilization, 1k, 5k, 10k, 80k

Set #1

Module	CPU (1k)	CPU (5k)	CPU (10k)	CPU (80k)
mod-inventory-b	18.91	71.08	72.14	72.96
nginx-okapi	12.24	55.4	58.95	64.91
mod-quick-marc-b	10.59	59.87	56.42	56.6
mod-source-record-storage-b	8.59	26.46	28.82	34.64
mod-permissions-b	6.87	2.69	2.71	2.91
okapi-b	6.55	39.77	39.77	42.23
mod-users-b	6.39	6.33	7.81	6.58
mod-pubsub-b	6.06	6.55	6.58	6.6
mod-authtoken-b	5.38	6.59	4.5	5.74
mod-inventory-storage-b	5.23	25.21	26.5	31.8
mod-di-converter-storage-b	5.03	17.64	18.01	22.73
mod-source-record-manager-b	4.14	19.18	21.4	25.49
mod-data-import-b	3.78	4.91	3.36	1.46
mod-password-validator-b	2.24	2.37	2.61	2.36
mod-configuration-b	2.14	2.14	2.65	2.27
mod-feesfines-b	1.98	2.14	2.13	2.14
mod-circulation-storage-b	0.56	0.58	0.78	0.6
mod-circulation-b	0.33	0.34	0.36	0.35
pub-okapi	0.16	0.1	0.16	0.13

Set #2

Module	CPU (1k)	CPU (5k)	CPU (10k)	CPU (80k)
mod-quick-marc-b	37.33	157.35	56.36	51.23
mod-inventory-b	28.33	82.48	85.27	66.23
okapi-b	7.84	36.81	40.03	39.22
mod-pubsub-b	6.08	6.54	6.61	6.55
mod-source-record-manager-b	5.92	23.4	21.93	23.38
mod-users-b	5.85	6.19	6.42	6.29
nginx-okapi	5.31	47.85	58.87	61.95
mod-source-record-storage-b	4.9	29.43	32.35	29.6
mod-di-converter-storage-b	4.53	26.61	34.96	22.05
mod-inventory-storage-b	3.38	29.42	30.18	32.14
mod-configuration-b	2.2	2.19	2.4	2.28
mod-password-validator-b	2.15	2.2	2.6	2.46
mod-feesfines-b	1.98	4.03	2.24	2.14
mod-permissions-b	1.97	2.75	4.27	2.68
mod-data-import-b	1.24	2.77	1.92	1.45
mod-authtoken-b	1.12	4.93	4.18	6.69
mod-circulation-storage-b	0.58	0.6	0.73	0.62
mod-circulation-b	0.47	0.35	0.42	0.33
pub-okapi	0.12	0.12	0.12	0.11

Memory Utilization

Memory consumption

Set #1

Module	Memory
mod-inventory-storage-b	87.22
mod-inventory-b	71.95
mod-data-import-b	59.06
mod-permissions-b	56.06
mod-source-record-manager-b	54.96
mod-users-b	52.87
okapi-b	50.51
mod-quick-marc-b	47.25
mod-di-converter-storage-b	46.32
mod-configuration-b	41.39
mod-feesfines-b	39.85
mod-pubsub-b	34.04
mod-source-record-storage-b	32.92
mod-authtoken-b	26.48
mod-circulation-storage-b	21.33
mod-circulation-b	16.13
nginx-okapi	4.85
pub-okapi	4.69

Set #2

Module	Memory
mod-data-import-b	57.93
mod-inventory-b	56.19
mod-permissions-b	55.98
mod-users-b	52.79
okapi-b	50.56
mod-source-record-manager-b	49.19
mod-quick-marc-b	47.33
mod-di-converter-storage-b	45.76
mod-configuration-b	41.56
mod-feesfines-b	39.88
mod-source-record-storage-b	38.79
mod-pubsub-b	33.96
mod-authtoken-b	26.21
mod-inventory-storage-b	23.8
mod-circulation-storage-b	21.24
mod-circulation-b	16.05
nginx-okapi	4.8
pub-okapi	4.69

MSK tenant cluster

Disk usage by broker

CPU (User) usage by broker

Open Search

CPU utilization master node

CPU utilization data node

Maximum free storage space (GiB)

Indexing rate

RDS CPU Utilization

For all tests - 95% except of DI Holdings with 1k file - 25%.

DB Connections

DB Load

Set #1

Set #2

SQL queries

TOP SQL Queries for Set #1

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *
INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)

WITH input_rows(record_id, holdings_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, holdings_id::uuid
   )
SELECT record_id, holdings_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.holdings_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_holdings c USING (record_id);

TOP SQL Queries for Set #2

INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *

INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb

WITH input_rows(record_id, holdings_id) AS (
   VALUES ($1::uuid,$2::uuid)
)
, ins AS (
   INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id)
   SELECT * FROM input_rows
   ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id
   RETURNING record_id::uuid, holdings_id::uuid
   )
SELECT record_id, holdings_id
FROM   ins
UNION  ALL
SELECT c.record_id, c.holdings_id 
FROM   input_rows
JOIN   fs09000000_mod_inventory.records_holdings c USING (record_id);

Infrastructure

PTF - environment qcp1

10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instances, writer

Name	Memory GIB	vCPUs	Engine version
db.r6g.xlarge	32 GB	4 vCPUs	16.1

MSK tenant
- 2 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
Open Search ptf-test
- version OpenSearch_2_7_R20240502
- Data nodes
  - Instance type - r6g.2xlarge.search
  - Number of nodes - 4
  - Storage type - EBS
  - EBS volume size (GiB) - 500
- Dedicated master nodes
  - Instance type - r6g.large.search
  - Number of nodes - 3
DB records
- fs09000000
  - Instances - 25901331
  - Items - 27074913
  - Holdings - 25871735
- fs07000001
  - Instances - 10100620
  - Items - 1484850
  - Holdings - 10522266
- fs07000002
  - Instances - 1161275
  - Items - 1153548
  - Holdings - 1153548

Modules

Module Version	Revision	Task Count	Mem Hard Limit	Mem Soft Limit	CPU	Xmx	MetaspaceSize	MaxMetaspaceSize
mod-users-bl:7.7.0	5	2	1440	1152	512	922	88	128
mod-configuration:5.10.0	5	2	1024	896	128	768	88	128
mod-authtoken:2.15.1	6	2	1440	1152	512	922	88	128
mod-data-import:3.1.0	8	1	2048	1844	256	1292	384	512
mod-remote-storage:3.2.0	5	2	4920	4472	1024	3960	512	512
mod-inventory-storage:27.1.0	5	2	4096	3690	2048	3076	384	512
pub-okapi:2023.06.14	3	2	1024	896	128	768	-	-
mod-feesfines:19.1.0	5	2	1024	896	128	768	88	128
okapi:5.3.0	5	3	1684	1440	1024	922	384	512
nginx-okapi:2023.06.14	3	2	1024	896	128	-	-	-
mod-quick-marc:5.1.0	5	1	2288	2176	128	1664	384	512
mod-source-record-manager:3.9.0-SNAPSHOT.330	6	2	5600	5000	2048	3500	384	512
mod-patron-blocks:1.10.0	5	2	1024	896	1024	768	88	128
mod-pubsub:2.13.0	5	2	1536	1440	1024	922	384	512
mod-circulation:24.2.0	5	2	2880	2592	1536	1814	384	512
mod-di-converter-storage:2.2.0	5	2	1024	896	128	768	88	128
mod-inventory:20.2.0	5	2	2880	2592	1024	1814	384	512
mod-source-record-storage:5.8.0	5	2	5600	5000	2048	3500	384	512
mod-circulation-storage:17.2.0	5	2	2880	2592	1536	1814	384	512
mod-organizations-storage:4.7.0	5	2	1024	896	128	768	88	128
mod-notes:5.2.0	5	2	1024	896	128	952	384	512
mod-gobi:2.8.0	5	2	1024	896	128	768	88	128
mod-permissions:6.5.0	10	2	1684	1544	512	1024	384	512
mod-search:3.2.0	5	2	2592	2480	2048	1440	512	1024

Methodology/Approach

Prepare Data Import Files 1k, 5k, 10k, 80k with defined number of holding records associated with instance HRID (1 instance HRID for all records or 1 per 1000 records)
1. replace instance HRID field with active one from the environment (example: =004 colin00001144043)
2. replace location field (example =852 01$bme3CC$hKFN5860.A6$iC732) where me3CC - the code of tenant location. Go to /settings/tenant-settings/location-locations and take the code of the location with active status
3. to replace the field 004 - extract instance HRIDs of active instances for this tenant. Use sql query below
  1. Get total jobs durations
    SQL to get job durations
    select file_name,total_records_in_file,started_date,completed_date, completed_date - started_date as duration ,status,error_status from [tenant]_mod_source_record_manager.job_execution where subordination_type = 'COMPOSITE_PARENT' -- where started_date > '2024-06-13 14:47:54' and completed_date < '2024-06-13 19:01:50.832' order by started_date desc limit 10
  2. Get instance HRID ids
    SQL to get instance HRIDs
    select jsonb->>'hrid' as instanceHRID from [tenant]_mod_inventory_storage.instance where jsonb->>'discoverySuppress' = 'false' and jsonb->>'source' = 'MARC' limit 80
  3. Put instance HRID ids into stringsHRID.txt file without double quotes and headers. Every row should contain only HRID id
  4. Use PY script to replace HRID ids in mrc file if needed. Script is located in Git repository perf-testing\workflows-scripts\data-import\Holdings\Data_preparation_steps
Run Data Import sequentially one by one from the UI with 5 min delay (delay time can vary - this time defined as comfortable to get results).

Data Import Create MARC Holdings Records [non-ECS] [Quesnelia]

Summary

Recommendations & Jiras

Test Runs

Comparison

Service CPU Utilization

Memory Utilization

MSK tenant cluster

Disk usage by broker

CPU (User) usage by broker

Open Search

RDS CPU Utilization

DB Connections

DB Load

SQL queries

Infrastructure

Methodology/Approach