Overview
The purpose of the document is getting results of testing Data Import Create MARC holdings records and to detect performance trends in Quesnelia in scope of ticket - PERF-833Getting issue details... STATUS
Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]
Summary
- Data import create holdings jobs perform faster in Quesnelia release about 40% in Average comparing with Poppy.
- Number of associated holdings to one instance in files used in tests do not affect duration of data import in non-ECS environments.
- Top CPU utilization: mod-inventory-b - 73%, nginx-okapi - 65%, mod-quick-marc-b - 57%, mod-source-record-storage-b - 35%
- Top Memory consumption:
- Set #1: mod-inventory-storage-b - 87%, mod-inventory-b - 72%, mod-data-import-b - 59%, Spikes for mod-inventory-storage-b were observed and after tests finished it returned into "before tests" state.
- Set #2: mod-inventory-storage-b - 24%, mod-inventory-b - 56%, mod-data-import-b - 58%, mod-users-b - 53%. During
- RDS CPU utilization was on level 95% for all DI tests except of test with 1k file.
- RDS DB connections were 860
Recommendations & Jiras
- Investigate memory growing trend for mod-inventory-storage in tests set #1 (using 1 instance HRID to create all Holdings).
Test Runs
Profile used for testing - Default - Create Holdings and SRS MARC Holdings
Set of tests â„– | Scenario | Test Conditions |
---|---|---|
1 | DI Holdings Create (previous* approach) | 1K, 5K, 10K, 80K sequentially |
2 | DI Holdings Create (new** approach) | 1K, 5K, 10K, 80K sequentially |
*previous approach - Data import Holdings with mrc file where 1 instance HRID is associated to all holdings (1k, 5k, 10k, 80k)
**new approach - Data import Holdings with mrc file where 1 instance HRID is associated to 1000 holdings
Test Results
Set 1 - Files used to test DI create Holdings had 1 instance HRID for all created Holdings
Set 2 - Files used to test DI create Holdings had 1 unique instance HRID for every 1000 created Holdings (new approach)
Test | Number of records in File | Duration: Quesnelia [non-ECS] Set #1 | Duration: Quesnelia [non-ECS] Set #2 | Status and Errors Quesnelia [non-ECS] Set #1, Set #2 |
---|---|---|---|---|
1 | 1k | 19 sec | 25 sec | Success |
2 | 5k | 1 min 17 sec | 1 min 24 sec | Success |
3 | 10k | 2 min 32 sec | 2 min 40 sec | Success |
4 | 80k | 19 min 54 sec | 21 min 44 sec | Success |
Comparison
Test | Number of records in File | Duration: Poppy [non-ECS] | Duration: Quesnelia [non-ECS] Set #1 | Delta, sec | %, Delta/Poppy Duration |
---|---|---|---|---|---|
1 | 1k | 32 sec | 19 sec | 13 sec | 40.63% |
2 | 5k | 2 min 14 sec | 1 min 17 sec | 57 sec | 42.54% |
3 | 10k | 4 min 35 sec | 2 min 32 sec | 2 min 3 sec | 44.73% |
4 | 80k | 36 min 25 sec | 19 min 54 sec | 16 min 31 sec | 45.35% |
Compared with results in previous test report: Data Import Create MARC holdings records [non-ECS] [Poppy]
Service CPU Utilization
Memory Utilization
MSK tenant cluster
Disk usage by broker
CPU (User) usage by broker
Open Search
CPU utilization master node
CPU utilization data node
Maximum free storage space (GiB)
Indexing rate
RDS CPU Utilization
For all tests - 95% except of DI Holdings with 1k file - 25%.
DB Connections
DB Load
Set #1
Set #2
SQL queries
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2) INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb) WITH input_rows(record_id, holdings_id) AS ( VALUES ($1::uuid,$2::uuid) ) , ins AS ( INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id) SELECT * FROM input_rows ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id RETURNING record_id::uuid, holdings_id::uuid ) SELECT record_id, holdings_id FROM ins UNION ALL SELECT c.record_id, c.holdings_id FROM input_rows JOIN fs09000000_mod_inventory.records_holdings c USING (record_id);
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2) UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning * INSERT INTO fs09000000_mod_inventory_storage.holdings_record (id, jsonb) VALUES ($1, $2) RETURNING jsonb WITH input_rows(record_id, holdings_id) AS ( VALUES ($1::uuid,$2::uuid) ) , ins AS ( INSERT INTO fs09000000_mod_inventory.records_holdings(record_id, holdings_id) SELECT * FROM input_rows ON CONFLICT (record_id) DO UPDATE SET record_id=EXCLUDED.record_id RETURNING record_id::uuid, holdings_id::uuid ) SELECT record_id, holdings_id FROM ins UNION ALL SELECT c.record_id, c.holdings_id FROM input_rows JOIN fs09000000_mod_inventory.records_holdings c USING (record_id);
Infrastructure
PTF - environment qcp1
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
- 1 database instances, writer
Name | Memory GIB | vCPUs | Engine version |
---|---|---|---|
db.r6g.xlarge | 32 GB | 4 vCPUs | 16.1 |
- MSK tenant
- 2 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
- Open Search ptf-test
- version OpenSearch_2_7_R20240502
- Data nodes
- Instance type - r6g.2xlarge.search
- Number of nodes - 4
- Storage type - EBS
- EBS volume size (GiB) - 500
- Dedicated master nodes
- Instance type - r6g.large.search
- Number of nodes - 3
- DB records
- fs09000000
- Instances - 25901331
- Items - 27074913
- Holdings - 25871735
- fs07000001
- Instances - 10100620
- Items - 1484850
- Holdings - 10522266
- fs07000002
- Instances - 1161275
- Items - 1153548
- Holdings - 1153548
- fs09000000
Methodology/Approach
- Prepare Data Import Files 1k, 5k, 10k, 80k with defined number of holding records associated with instance HRID (1 instance HRID for all records or 1 per 1000 records)
- replace instance HRID field with active one from the environment (example: =004 colin00001144043)
- replace location field (example =852 01$bme3CC$hKFN5860.A6$iC732) where me3CC - the code of tenant location. Go to /settings/tenant-settings/location-locations and take the code of the location with active status
- to replace the field 004 - extract instance HRIDs of active instances for this tenant. Use sql query below
Get total jobs durations
SQL to get job durationsselect file_name,total_records_in_file,started_date,completed_date, completed_date - started_date as duration ,status,error_status from [tenant]_mod_source_record_manager.job_execution where subordination_type = 'COMPOSITE_PARENT' -- where started_date > '2024-06-13 14:47:54' and completed_date < '2024-06-13 19:01:50.832' order by started_date desc limit 10
Get instance HRID ids
SQL to get instance HRIDsselect jsonb->>'hrid' as instanceHRID from [tenant]_mod_inventory_storage.instance where jsonb->>'discoverySuppress' = 'false' and jsonb->>'source' = 'MARC' limit 80
- Put instance HRID ids into stringsHRID.txt file without double quotes and headers. Every row should contain only HRID id
- Use PY script to replace HRID ids in mrc file if needed. Script is located in Git repository perf-testing\workflows-scripts\data-import\Holdings\Data_preparation_steps
- Run Data Import sequentially one by one from the UI with 5 min delay (delay time can vary - this time defined as comfortable to get results).