Overview
The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket - PERF-833Getting issue details... STATUS
Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]
Summary
- Data import create holdings jobs perform faster in Quesnelia release about 40% in Average comparing with Poppy.
- Number of associated holdings to one instance in files used in tests do not affect duration of data import in non-ECS environments.
- Top CPU utilization: mod-inventory-b - 73%, nginx-okapi - 65%, mod-quick-marc-b - 57%, mod-source-record-storage-b - 35%
- Top Memory consumption:
- Set #1: mod-inventory-storage-b - 87%, mod-inventory-b - 72%, mod-data-import-b - 59%, Spikes for mod-inventory-storage-b were observed and after tests finished it returned into "before tests" state.
- Set #2: mod-inventory-storage-b - 24%, mod-inventory-b - 56%, mod-data-import-b - 58%, mod-users-b - 53%. During
- RDS CPU utilization was on level 95% for all DI tests except of test with 1k file.
- RDS DB connections were 860
Recommendations & Jiras
- Investigate memory growing trend for mod-inventory-storage in tests set #1 (using 1 instance HRID to create all Holdings).
Test Runs
Profile used for testing - Default - Create Holdings and SRS MARC Holdings
Set of tests â„– | Scenario | Test Conditions |
---|---|---|
1 | DI Holdings Create (previous* approach) | 1K, 5K, 10K, 80K sequentially |
2 | DI Holdings Create (new** approach) | 1K, 5K, 10K, 80K sequentially |
*previous approach - Data import Holdings with mrc file where 1 instance HRID is associated to all holdings (1k, 5k, 10k, 80k)
**new approach - Data import Holdings with mrc file where 1 instance HRID is associated to 1000 holdings
Test Results
Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings
Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)
Test | Number of records in File | Duration: Quesnelia [non-ECS] Set #1 | Duration: Quesnelia [non-ECS] Set #2 | Status and Errors Quesnelia [non-ECS] Set #1, Set #2 |
---|---|---|---|---|
1 | 1k | 19 sec | 25 sec | Success |
2 | 5k | 1 min 17 sec | 1 min 24 sec | Success |
3 | 10k | 2 min 32 sec | 2 min 40 sec | Success |
4 | 80k | 19 min 54 sec | 21 min 44 sec | Success |
Comparison
Test | Number of records in File | Duration: Poppy [non-ECS] | Duration: Quesnelia [non-ECS] Set #1 | Delta, sec | %, Delta/Poppy Duration |
---|---|---|---|---|---|
1 | 1k | 32 sec | 19 sec | 13 sec | 40.63% |
2 | 5k | 2 min 14 sec | 1 min 17 sec | 57 sec | 42.54% |
3 | 10k | 4 min 35 sec | 2 min 32 sec | 2 min 3 sec | 44.73% |
4 | 80k | 36 min 25 sec | 19 min 54 sec | 16 min 31 sec | 45.35% |
Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]
Service CPU Utilization
Memory Utilization
MSK tenant cluster
Disk usage by broker
CPU (User) usage by broker
Open Search
RDS CPU Utilization
For all tests - 95% except of DI Holdings with 1k file - 25%.
DB Connections
DB Load
Set #1
Set #2
SQL queries
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2) INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb) WITH input_rows(record_id, holdings_id) AS ( VALUES ($1::uuid,$2::uuid) ) , ins AS ( INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id) SELECT * FROM input_rows ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id RETURNING record_id::uuid, holdings_id::uuid ) SELECT record_id, holdings_id FROM ins UNION ALL SELECT c.record_id, c.holdings_id FROM input_rows JOIN fs09000000_mod_inventory.records_holdings c USING (record_id);
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2) UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb WITH input_rows(record_id, holdings_id) AS ( VALUES ($1::uuid,$2::uuid) ) , ins AS ( INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id) SELECT * FROM input_rows ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id RETURNING record_id::uuid, holdings_id::uuid ) SELECT record_id, holdings_id FROM ins UNION ALL SELECT c.record_id, c.holdings_id FROM input_rows JOIN fs09000000_mod_inventory.records_holdings c USING (record_id);
Infrastructure
PTF - environment qcp1
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 1 database instances, writer
Name | Memory GIB | vCPUs | Engine version |
---|---|---|---|
db.r6g.xlarge | 32 GB | 4 vCPUs | 16.1 |
- MSK tenant
- 2 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
- Open Search ptf-test
- version OpenSearch_2_7_R20240502
- Data nodes
- Instance type - r6g.2xlarge.search
- Number of nodes - 4
- Storage type - EBS
- EBS volume size (GiB) - 500
- Dedicated master nodes
- Instance type - r6g.large.search
- Number of nodes - 3
- DB records
- fs09000000
- Instances - 25901331
- Items - 27074913
- Holdings - 25871735
- fs07000001
- Instances - 10100620
- Items - 1484850
- Holdings - 10522266
- fs07000002
- Instances - 1161275
- Items - 1153548
- Holdings - 1153548
- fs09000000
Methodology/Approach
- Prepare Data Import Files 1k, 5k, 10k, 80k with defined number of holding records associated with instance HRID (1 instance HRID for all records or 1 per 1000 records)
- replace instance HRID field with active one from the environment (example: =004 colin00001144043)
- replace location field (example =852 01$bme3CC$hKFN5860.A6$iC732) where me3CC - the code of tenant location. Go to /settings/tenant-settings/location-locations and take the code of the location with active status
- to replace the field 004 - extract instance HRIDs of active instances for this tenant. Use sql query below
Get total jobs durations
SQL to get job durationsselect file_name,total_records_in_file,started_date,completed_date, completed_date - started_date as duration ,status,error_status from [tenant]_mod_source_record_manager.job_execution where subordination_type = 'COMPOSITE_PARENT' -- where started_date > '2024-06-13 14:47:54' and completed_date < '2024-06-13 19:01:50.832' order by started_date desc limit 10
Get instance HRID ids
SQL to get instance HRIDsselect jsonb->>'hrid' as instanceHRID from [tenant]_mod_inventory_storage.instance where jsonb->>'discoverySuppress' = 'false' and jsonb->>'source' = 'MARC' limit 80
- Put instance HRID ids into stringsHRID.txt file without double quotes and headers. Every row should contain only HRID id
- Use PY script to replace HRID ids in mrc file if needed. Script is located in Git repository perf-testing\workflows-scripts\data-import\Holdings\Data_preparation_steps
- Run Data Import sequentially one by one from the UI with 5 min delay (delay time can vary - this time defined as comfortable to get results).