[Ramsons] [ECS] Bulk Edit of MARC and FOLIO Instance records
Overview
- This document contains the results of the testing Bulk Edit of MARC and FOLIO Instance records by submitting list of identifiers for 100, 1000, 10 000 records or triggering bulk edit by submitting the query. This time there're no tests with 100 000 records file and tests with queries (scenario 2, 4, 6 - MODFQMMGR-667 - Getting issue details... STATUS ).
- Tests carried out on Ramsons release with ECS Eureka like environment (central tenant) in scope of Eureka testing
- The process of Bulk Edit in scenarios 1, 3, 5 includes two steps. First step - uploading file, second one - committing changes.
- Jira PERF-1032 - Getting issue details... STATUS
Summary
- Bulk edit operations with instance Ids from csv file completed successfully in all scenarios for 100, 1000 and 10000 records. Scenario 5 lasts 3 times longer in average than scenario 1 (FOLIO) or scenario 3 (MARC).
- Memory usage was stable during tests. No memory leaks revealed. Top usage observed from folio-keycloak module - 92% and mod-roles-keycloak - 67% in Bulk Edit with MARC records (scenario 3, 10000 records)
- CPU was utilized in accordance with the load.
- DB CPU utilized 99% of resources during each test.
- DB connections were 1150 in average (scenario 1, scenario 5).
- During scenario 3 (MARC records) with 1000 records deadlocks observed which affected 2 records to fail with timeout error message. Issue described in ticket MSEARCH-961 - Getting issue details... STATUS
Jiras
- MSEARCH-1004 - Getting issue details... STATUS high latency on DB queries
Test description
| Test name | source | Number of records | Action | query-1 | query-2 |
|---|---|---|---|---|---|
Scenario 1 | FOLIO | 100, 1000, 10 000 |
| from .csv file | from .csv file |
Scenario 2 | FOLIO | 2007 |
| (instance.source == "FOLIO") AND (instance.format_names contains any (<format names supported in test environment>)) [computer -- online resource] | English (354) (instance.source == "folio") AND (instance.languages contains all ("")) |
Scenario 3 | MARC | 100, 1000, 10 000 |
| from .csv file | from .csv file |
Scenario 4 | MARC | 1148 |
| (instance.source == "MARC") AND (instance.format_names contains any (<format names supported in test environment>)) [unmediated -- object] | Albanian (1055) (instance.source == "marc") AND (instance.languages contains all ("")) |
Scenario 5 | FOLIO & MARC | 100, 1000, 10 000 |
| from .csv file | from .csv file |
Scenario 6 | FOLIO & MARC | 562 |
| (instance.format_names contains any (<format names supported in test environment>)) [audio -- other] | Georgian (1294) (instance.languages contains all ("")) |
Test Results
| Test # | Source | file size | Upload Duration (hh:mm:ss) | Update Duration (hh:mm:ss) | Total time (hh:mm:ss) | Results |
|---|---|---|---|---|---|---|
| Scenario 1 | FOLIO | 100 | 00:00:12 | 00:00:14 | 00:00:26 | Completed |
| FOLIO | 1000 | 00:01:18 | 00:02:28 | 00:03:46 | Completed | |
| FOLIO | 10 000 | 00:09:38 | 00:24:13 | 00:33:51 | Completed | |
Scenario 2 | FOLIO | 2166 (query-1) 388 (query-2) | Not executed because of MODFQMMGR-667 - Getting issue details... STATUS | |||
| Scenario 3 | MARC | 100 | 00:00:11 | 00:00:28 | 00:00:39 | Completed |
| MARC | 1000 | 00:01:37 | 00:01:26 | 00:03:03 | Completed* | |
| MARC | 10 000 | 00:11:26 | 00:14:46 | 00:26:12 | Completed | |
Scenario 4 | MARC | 1055 (query-2) | Not executed because of MODFQMMGR-667 - Getting issue details... STATUS | |||
| Scenario 5 | FOLIO & MARC | 100 | 00:00:06 | 00:00:58 | 00:01:04 | Completed |
| FOLIO & MARC | 1000 | 00:00:57 | 00:08:00 | 00:08:57 | Completed | |
| FOLIO & MARC | 10 000 | 00:09:12 | 01:15:00 | 01:24:12 | Completed | |
| Scenario 6 | FOLIO & MARC | 1294 (query-2) | Not executed because of MODFQMMGR-667 - Getting issue details... STATUS |
* Timeout issue with 2 records observed during the test. Also deadlocks observed during the test (see Other RDS metrics and observations).
Resource utilization
Instance CPU Utilization
Resource utilization was very similar for all of the tests. I took the most representative.
Scenario 1, Scenario 5
Service CPU Utilization
We observe spike of edge-inventory-b with 60% at the very beginning of test #5 (mixed flow)
Service Memory Utilization
Scenario 1, Scenario 3, Scenario 5 graphs didn't show any memory growing trend
DB CPU Utilization
Scenario 1, Scenario 5 show that database utilized all its available resources during Bulk Edit - 99%
Scenario 3 - 97%
DB Connections
All scenarios used almost the same number of connections - in range of 1150 - 1250
Scenario 1, Scenario 5 graph - 1150
Scenario 3 - 1250
DB load
These database queries were observed for all of the scenarios.
Scenario 1, Scenario 5
Top SQL-queries
Scenario 1, Scenario 5
During all Bulk Edit tests high latency observed during reindexing
| # | TOP 5 SQL statements |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 |
Other RDS metrics and observations
During updating records high latency observed with the request
SQL text for digest ID: -2473324649671274300 WITH deleted_ids as ( DELETE FROM cs00000int_mod_search.instance_subject WHERE instance_id IN ($1::uuid,$2::uuid,$3::uuid,$4::uuid) RETURNING subject_id ) UPDATE cs00000int_mod_search.subject SET last_updated_date = CURRENT_TIMESTAMP WHERE id IN (SELECT * FROM deleted_ids)
During BE with 1000 MARC records deadlock happened once resulting Timeout error message as output and failing 2 records.
Appendix
Infrastructure
PTF - eureka environment Ramsons (relc)
- 12 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs db.r7g.xlarge
32 GiB 4 vCPUs - MSK fse-test
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=2
Methodology/Approach
Test set
Data preparation: prepare . csv files with instance UUIDs for each of the test step
The script to prepare files is here.
Populate the database with FOLIO records if needed with this script.
- Manually tested all of the scenarios started on one tenant only:
- Trigger bulk edit by submitting a list of identifiers for 100, 1000, 10 000 records or run a query
- Wait until Bulk edit finds all matched records and record the startTime and endTime of the process from the response.
- Edit instances and start the second part of the bulk edit to commit the changes.
- Record the start time of committing changes from Headers
- Database preparation script after the test run required to restore Suppress from discovery and Staff suppress parameters.
Additional screenshots