[Ramsons] [non-ECS] Bulk Edit of MARC and FOLIO Instance records

[Ramsons] [non-ECS] Bulk Edit of MARC and FOLIO Instance records

Overview

  • This document contains the results of the testing Bulk Edit of MARC and FOLIO Instance records by submitting list of identifiers for 100, 1000, 10 000, 100 000 records or triggering bulk edit by submitting the query
  •  Ramsons release on rcp1 environment on a non-ecs okapi environment (tenant fs09000000).
  • Jira PERF-1034 - Getting issue details... STATUS

Summary

  • Bulk edits could be performed with .csv files of 100, 1000, 10 000, and 100 000 records; however, updating 100k records is very slow and can take up to 11 hours.
  • Bulk edits could be performed by submitting the query but due to MODFQMMGR-667 - Getting issue details... STATUS it could be unsuccessful.
  • Memory consumption for all modules shows a stable trend with periodical increasing and decreasing memory usage. No memory leaks are suspected.
  • Service CPU utilization corresponded to the load and did not exceed 2.1%.
  • Instance CPU Utilization did not exceed 13.5% 
  • DB CPU utilization was very high 99.7%.
  • DB connections - up to 1146 on average.
  • Database deadlocks were observed during the Bulk edit updating records step MSEARCH-961 - Getting issue details... STATUS

Jiras

MSEARCH-961 - Getting issue details... STATUS

MODEXPW-560 - Getting issue details... STATUS

MODFQMMGR-667 - Getting issue details... STATUS

Test description

Test name

source

Number of recordsActionquery-1query-2

Scenario 1

FOLIO

100, 1000, 10 000, 100 000
  1. Add  Action note: "Regular data cleanup."

  2. Mark the instance as Staff suppress 

from .csv filefrom .csv file

Scenario 2

FOLIO

2007 
  1. Add  Action note: "Regular data cleanup."

  2. Mark the instance as Staff suppress 

(instance.source == "FOLIO") AND (instance.format_names contains any (<format names supported in test environment>)) [computer -- online resource]English (354)
(instance.source == "folio") AND (instance.languages contains all (""))

Scenario 3

MARC

100, 1000, 10 000, 100 000
  1. Add 500 field with both identifiers empty and subfield “a” with the text “From the papers of the Smith family.” and subfield “5” set to “DLC”

from .csv filefrom .csv file

Scenario 4

MARC

1148 
  1. Add 500 field with both identifiers empty and subfield “a” with the text “From the papers of the Smith family.” and subfield “5” set to “DLC”

(instance.source == "MARC") AND (instance.format_names contains any (<format names supported in test environment>)) [unmediated -- object]Albanian (1055)
(instance.source == "marc") AND (instance.languages contains all (""))

Scenario 5

FOLIO &

MARC

100, 1000, 10 000, 100 000
  1. Suppress from discovery including option for holdings and items

from .csv filefrom .csv file

Scenario 6

FOLIO &

MARC

562
  1. Suppress from discovery including option for holdings and items

(instance.format_names contains any (<format names supported in test environment>)) [audio -- other]Georgian (1294)
(instance.languages contains all (""))

Test Results

Test #Sourcefile size

Upload Duration (hh:mm:ss)

Update Duration  (hh:mm:ss)

Total time
(hh:mm:ss)

Results

Scenario 1

FOLIO

10000:00:1200:00:2800:00:40Completed

FOLIO

100000:01:2200:03:0500:04:27Completed

FOLIO

10 00000:13:3500:31:3000:45:05Completed

FOLIO

100 00002:21:5105:56:3008:18:21Completed

Scenario 2

FOLIO

2166 (query-1)

388 (query-2)

00:06:05

00:01:11

00:09:43

00:02:15

00:15:48

00:03:26

query-1 Completed
query-2 Completed

Scenario 3

MARC

10000:00:1700:00:2400:00:41Completed

MARC

100000:02:1400:01:5000:04:04Completed

MARC

10 00000:22:2100:32:5200:55:13Completed

MARC

100 00004:13:4803:45:3807:59:26Completed

Scenario 4

MARC

1055 (query-2)---

Test query-1 not finished in 2 hours
Test query-2 not finished in 1 hour
MODFQMMGR-667 - Getting issue details... STATUS

Scenario 5

FOLIO & MARC

10000:00:1400:00:2700:00:41Completed

FOLIO & MARC

100000:01:1800:04:0700:05:25Completed

FOLIO & MARC

10 00000:17:1300:53:3001:10:43Completed

FOLIO & MARC

100 00002:54:4608:09:4811:04:34Completed

FOLIO & MARC

1294 (query-2)---

Test query-1 not finished in 2 hours

Test query-2 not finished in 1 hour
MODFQMMGR-667 - Getting issue details... STATUS


Resource utilization

Instance CPU Utilization

Resource utilization was very similar for all of the tests. I took the most representative.

Scenario 1

scenario 5

Service CPU Utilization

Here we can see that the modules used 1.7% CPU

Scenario 1

All modules graph: We can see spikes in mod-bulk-operations and mod-source-record-storage due to manual service restart but average CPU Utilization for these 2 modules did not exceed 1%.

Scenario 5

Service Memory Utilization

Here we can see that all modules show a stable trend.

Scenario 1

No constantly growing trend is observed.

Scenario 5

DB CPU Utilization

Scenario 1

DB CPU on average was up to 99.7% while updating 100k records because updating is the process that uses data import and is very consuming in database CPU resources.

Scenario 5

DB Connections

Scenario 1 (almost the same for all of the tests)

DB connections was up to 1149.

Scenario 5 (almost the same for all of the tests)

DB load

 These database queries were observed for all of the scenarios.                                                                                            

Top SQL-queries



#TOP 5 SQL statements
1
WITH deleted_ids as (
    DELETE
    FROM fs09000000_mod_search.instance_contributor
    WHERE instance_id IN ($1::uuid)
    RETURNING contributor_id
)
UPDATE fs09000000_mod_search.contributor
SET last_updated_date = CURRENT_TIMESTAMP
WHERE id IN (SELECT * FROM deleted_ids)
2
WITH cte AS (SELECT id,
                    name,
                    name_type_id,
                    authority_id,
                    last_updated_date
             FROM fs09000000_mod_search.contributor
             WHERE last_updated_date > $1
             ORDER BY last_updated_date
             )
SELECT c.id,
       c.name,
       c.name_type_id,
       c.authority_id,
       c.last_updated_date,
       json_agg(
               CASE
                                WHEN sub.instance_count 
3
call delete_old_marc_indexers_versions($1)
4
SELECT fs09000000_mod_inventory_storage.count_estimate(?)
SELECT fs09000000_mod_inventory_storage.count_estimate('SELECT jsonb,id FROM fs09000000_mod_inventory_storage.service_point WHERE ((true) AND ( (get_tsvector(f_unaccent(service_point.jsonb->>''ecsRequestRouting'')) @@ tsquery_phrase(f_unaccent(''true''))) IS NOT TRUE)) AND (id=''3a7de149-f17d-4602-adfe-0b09bf8e884a'')')
5
SELECT jsonb FROM fs09000000_mod_permissions.permissions
6
INSERT INTO fs09000000_mod_search.instance (id, tenant_id, shared, is_bound_with, json)
  VALUES ($1::uuid, $2, $3, $4, $5::jsonb)
  ON CONFLICT (id)
  DO UPDATE SET shared = EXCLUDED.shared,
  tenant_id = EXCLUDED.tenant_id,
  is_bound_with = EXCLUDED.is_bound_with,
  json = EXCLUDED.json

Other RDS metrics and observations

During updating records from time to time database deadlocks were observed 
Locking transactions:

SQL text for digest ID: -206532979961713955

WITH deleted_ids as (
DELETE
FROM fs09000000_mod_search.instance_contributor
WHERE instance_id IN ($1::uuid,$2::uuid)
RETURNING contributor_id
)
UPDATE fs09000000_mod_search.contributor
SET last_updated_date = CURRENT_TIMESTAMP
WHERE id IN (SELECT * FROM deleted_ids)
--------------------------------------with----------------------------------------------
INSERT INTO fs09000000_mod_search.contributor (id, name, name_type_id, authority_id)
VALUES ($1, $2, $3, $4),($5, $6, $7, $8)
ON CONFLICT (id) DO UPDATE SET last_updated_date = CURRENT_TIMESTAMP

Appendix

Infrastructure

PTF - environment Ramsons (rcp1)

  • 6 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database instance, writer

    NameMemory GIBvCPUs

    db.r6g.xlarge

    32 GiBvCPUs
  • MSK fse-test
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 3.7.x

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=2


 Quesnelia modules memory and CPU parameters
Cluster Resources - rcp1-pvt (Thu Feb 06 08:00:39 UTC 2025)









ModuleTask Definition RevisionModule VersionTask CountMem Hard LimitMem Soft LimitCPU UnitsXmxMetaspace SizeMax Metaspace Size
mod-remote-storage12/folio/mod-remote-storage:3.3.524920447203960512512
mod-ncip9/folio/mod-ncip:1.15.621024896076888128
mod-finance-storage10/folio/mod-finance-storage:8.7.321024896070088128
mod-agreements12/folio/mod-agreements:7.1.52159214880000
mod-ebsconet11/folio/mod-ebsconet:2.3.12124810240700128256
mod-organizations9/folio/mod-organizations:2.0.021024896070088128
mod-consortia4/folio/mod-consortia:1.2.2251364776044165121024
edge-sip210/folio/edge-sip2:3.3.121024896076888128
mod-settings10/folio/mod-settings:1.1.021024896076888128
mod-serials-management12/folio/mod-serials-management:1.1.322480231201792384512
edge-dematic9/folio/edge-dematic:2.3.111024896076888128
mod-data-import10/folio/mod-data-import:3.2.412048184401292384512
mod-search24/folio/mod-search:4.0.8225922480014405121024
mod-inn-reach5/folio/mod-inn-reach:3.2.1-SNAPSHOT.102236003240028805121024
mod-record-specifications18/folio/mod-record-specifications:1.0.221024896076888128
mod-tags9/folio/mod-tags:2.3.021024896076888128
mod-authtoken10/folio/mod-authtoken:2.16.1214401152092288128
edge-courses11/folio/edge-courses:1.5.121024896076888128
mod-notify9/folio/mod-notify:3.3.021024896076888128
mod-inventory-update9/folio/mod-inventory-update:4.0.021024896076888128
mod-configuration9/folio/mod-configuration:5.11.021024896076888128
mod-orders-storage10/folio/mod-orders-storage:13.8.321024896070088128
edge-caiasoft9/folio/edge-caiasoft:2.3.221024896076888128
mod-login-saml9/folio/mod-login-saml:2.9.321024896076888128
mod-erm-usage-harvester10/folio/mod-erm-usage-harvester:5.0.221024896076888128
mod-password-validator9/folio/mod-password-validator:3.3.02144012980768384512
mod-gobi9/folio/mod-gobi:2.9.021024896070088128
mod-licenses10/folio/mod-licenses:6.1.222480231201792384512
edge-dcb10/folio/edge-dcb:1.2.221024896076888128
mod-bulk-operations13/folio/mod-bulk-operations:2.1.1223072260001536384512
mod-fqm-manager24/folio/mod-fqm-manager:3.0.1023000260002048384512
mod-graphql11/folio/mod-graphql:1.13.121024896076888128
mod-finance11/folio/mod-finance:5.0.121024896070088128
mod-erm-usage9/folio/mod-erm-usage:5.0.021024896076888128
mod-batch-print9/folio/mod-batch-print:1.2.021024896076888128
mod-tlr5/folio/mod-tlr:1.0.0-SNAPSHOT.821024896076888128
mod-lists19/folio/mod-lists:3.0.8260002600076888128
mod-copycat9/folio/mod-copycat:1.7.021024512076888128
mod-entities-links15/folio/mod-entities-links:3.1.32259224800144001024
mod-permissions20/folio/mod-permissions:6.6.121684154401024384512
pub-edge7/folio/pub-edge:2023.06.1421024896076800
mod-orders12/folio/mod-orders:12.9.1022048144001024384512
edge-patron11/folio/edge-patron:5.2.121024896076888128
edge-ncip9/folio/edge-ncip:1.10.121024896076888128
mod-marc-migrations31/folio/mod-marc-migrations:1.0.021024896076888128
edge-inn-reach4/folio/edge-inn-reach:3.3.0-SNAPSHOT.6921024896076888128
mod-users-bl10/folio/mod-users-bl:7.9.3214401152092288128
mod-oa6/folio/mod-oa:2.1.0-SNAPSHOT.66220488960180088512
mod-invoice11/folio/mod-invoice:5.9.2214401152092288128
mod-inventory-storage18/folio/mod-inventory-storage:28.1.0-SNAPSHOT.792240963690030765121024
mod-user-import9/folio/mod-user-import:3.9.021024896076888128
mod-sender9/folio/mod-sender:1.13.021024896076888128
edge-oai-pmh10/folio/edge-oai-pmh:2.10.021512136001440384512
mod-data-export-worker12/folio/mod-data-export-worker:3.3.823072204802048384512
mod-rtac9/folio/mod-rtac:3.7.021024896076888128
mod-circulation-storage12/folio/mod-circulation-storage:17.3.522880259201814384512
mod-source-record-storage23/folio/mod-source-record-storage:5.9.825600500003500384512
mod-calendar10/folio/mod-calendar:3.2.021024896076888128
mod-event-config9/folio/mod-event-config:2.8.021024896076888128
mod-courses9/folio/mod-courses:1.4.1121024896076888128
mod-circulation-item9/folio/mod-circulation-item:1.1.0210248960000
mod-inventory12/folio/mod-inventory:21.0.922880259201814384512
mod-email11/folio/mod-email:1.18.121024896076888128
mod-requests-mediated13/folio/mod-requests-mediated:1.0.0-SNAPSHOT.421024896076888128
mod-di-converter-storage10/folio/mod-di-converter-storage:2.3.121024896076888128
mod-pubsub12/folio/mod-pubsub:2.15.42153614400922384512
mod-circulation12/folio/mod-circulation:24.3.1122880259201814384512
edge-orders9/folio/edge-orders:3.1.021024896076888128
edge-rtac9/folio/edge-rtac:2.8.021024896076888128
mod-template-engine9/folio/mod-template-engine:1.21.021024896076888128
mod-users11/folio/mod-users:19.4.521024896076888128
mod-patron-blocks10/folio/mod-patron-blocks:1.11.121024896076888128
mod-audit11/folio/mod-audit:2.10.221024896076888128
edge-fqm11/folio/edge-fqm:3.0.221024896076888128
mod-source-record-manager16/folio/mod-source-record-manager:3.9.725600500003500384512
nginx-edge7/folio/nginx-edge:2023.06.14210248960000
mod-quick-marc9/folio/mod-quick-marc:6.0.012288217601664384512
nginx-okapi7/folio/nginx-okapi:2023.06.14210248960000
okapi-b10/folio/okapi:6.1.132024144001024384512
mod-feesfines10/folio/mod-feesfines:19.2.221024896076888128
mod-invoice-storage10/folio/mod-invoice-storage:5.9.121872153601024384512
mod-reading-room10/folio/mod-reading-room:1.0.021024896076888128
mod-service-interaction11/folio/mod-service-interaction:4.1.122048184401290384512
mod-dcb12/folio/mod-dcb:1.2.621024896076888128
mod-data-export14/folio/mod-data-export:5.1.61204818440000
mod-patron12/folio/mod-patron:6.2.621024896076888128
mod-oai-pmh11/folio/mod-oai-pmh:3.14.424096369003076384512
edge-connexion9/folio/edge-connexion:1.3.121024896076888128
mod-notes9/folio/mod-notes:6.0.0210248960952384512
mod-kb-ebsco-java9/folio/mod-kb-ebsco-java:5.0.021024896076888128
mod-login10/folio/mod-login:7.12.12144012980768384512
mod-organizations-storage9/folio/mod-organizations-storage:4.8.121024896070088128
mod-data-export-spring11/folio/mod-data-export-spring:3.4.312048184401536384512
pub-okapi7/folio/pub-okapi:2023.06.1421024896076800
edge-erm8/folio/edge-erm:1.3.021024896076888128
mod-eusage-reports9/folio/mod-eusage-reports:3.0.021024896076888128


Methodology/Approach

Test set

Data preparation: prepare . csv files with instance UUIDs for each of the test step 
The script to prepare files is here.
Populate the database with FOLIO records if needed with this script.

  1.  Manually tested all of the scenarios started on one tenant only:
    1. Trigger bulk edit by submitting a list of identifiers for 100, 1000, 10 000, and 100 000 records or run a query
    2. Wait until Bulk edit finds all matched records and record the startTime and endTime of the process from the response.
    3. Edit instances and start the second part of the bulk edit to commit the changes.
    4. Record the start time of committing changes from Headers
  2. Database preparation script after the test run required to restore Suppress from discovery and Staff suppress parameters.

Additional screenshots




Related content