PTF - Migrate/Update large number of Marc Authority records (Ransoms - ECS)

Overview

  • This document contains the results of testing marc-migration for over 12 million records on Ramsons ECS environment. In the Ramsons release, Spitfire created mod-marc-migrations to separate out the data migration process from upgrading the module. In this test, we'll analyze the performance of the newly created module, mod-marc-migrations, with respect to migrating 12 million Authority records and collect Baseline performance measurements for the marc-migration process across central tenant. 

PERF-1005 - Getting issue details... STATUS  

Summary

  • During the tests, we collected the mapping_duration and saving_duration for the central tenant, along with the total_saving_duration for all member tenants, specifically during the final test. When the saving process started for the central tenant, it triggered the update and saving processes asynchronously for all member tenants. This behavior is specific to the ECS environment.
  • The saving process encountered the status DATA_SAVING_FAILED, and not all records were updated. This issue occurred because the central tenant contained record IDs that were not present in the member tenants. Percentage of Unsaved Records for Test №1 was 0.69% but for last Test №7 it was 5.35%, this issue should be investigated. 
  • We gather baseline performance metrics for the marc-migration process across the central tenant. However, our recommendation is to collect results for both the central and member tenants, and separate metrics for each individual member tenant.

Recommendations and Jiras

  • Repeat tests to collect results for both the central and member tenants.
  • Run tests to collect separate metrics for each individual member tenant.
  • Fix test data set to avoid issue that central tenant contained record IDs which not present in the member tenants.


Test №statustotal_num_of_recordsmapped_num_of_recordssaved_num_of_recordspercentage of Unsaved Records
Test №1DATA_SAVING_FAILED1206725012067250119836920.69%
Test №2DATA_SAVING_FAILED1206725012067250119592810.89%
Test №3DATA_SAVING_FAILED1206725012067250119214421.21%
Test №4DATA_SAVING_FAILED1206725012067250118119762.12%
Test №5DATA_SAVING_FAILED1206725012067250118159272.08%
Test №6DATA_SAVING_FAILED1206725012067250116726973.27%
Test №7DATA_SAVING_FAILED1206725012067250114217435.35%


Test Results

This table contains duration time for Migrated and saved Marc Authority records

Test №CHUNK_FETCH_IDS_COUNTRECORDS_CHUNK_SIZEmapping_duration
Central Tenant
saving_duration
Central Tenant
total_saving_duration
Central and Member Tenants
Test №15005003:01:491:18:09
Test №2200010002:22:241:06:29


Test №3400020002:07:420:50:57
Test №4500025002:00:051:26:05
Test №5700035002:01:270:52:20
Test №61000050002:06:320:50:41
Test №71200040002:21:130:54:052:04:41

*Total saving duration time for Central and Member Tenants have to collected after each test run, results for Test №7 were collected only from automatic migrations triggered from the central tenant. Separate tests for each Member Tenants was not run.

Test №1-2-3-4-5-6-7

Introduction: The Baseline RCON Environment configuration was applied, and CPU=0 was set for all modules.

Objective: The objective of these tests was to collect performance measurements for the marc-migration process across central tenants.

Results: Results were collect for central tenant and only for last test for member tenants. 

Instance CPU Utilization

Service CPU Utilization

Here we can see that mod-entities-links  module had spikes up to 90% Instances CPU power and mod-marc-migrations module used 20% Instances CPU power.

Service Memory Utilization

Here we can see that mod-entities-links had spikes up to 90% memory.


Kafka metrics

OpenSearch Data Nodes metrics

DB CPU Utilization

DB CPU had spikes up to 99%

DB Connections

Max number of DB connections was 1250.

DB load

                                                                                                                    

Top SQL-queries


Appendix

Infrastructure

PTF - Baseline RCON environment configuration

  • 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database  instance, writer

    NameMemory GIBvCPUs

    db.r6g.xlarge

    32 GB4 vCPUs
  • Open Search ptf-test 
    • Data nodes
      • Instance type - r6g.2xlarge.search
      • Number of nodes - 4
      • Version: OpenSearch_2_7_R20240502
    • Dedicated master nodes
      • Instance type - r6g.large.search
      • Number of nodes - 3
  • MSK fse-tenant
    • brokers, kafka.m7g.xlarge brokers in 2 zones
    • Apache Kafka version 3.7.x 

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


 rcon modules memory and CPU parameters

Cluster Resources - rcon-pvt
Fri Oct 18 05:45:21 UTC 2024

ModuleTask Definition RevisionModule VersionTask CountMem Hard LimitMem Soft LimitCPU UnitsXmxMetaspace SizeMax Metaspace Size
mod-remote-storage2mod-remote-storage:3.2.1-SNAPSHOT.17124920447203960512512
mod-finance-storage2mod-finance-storage:8.7.0-SNAPSHOT.18321024896070088128
mod-ncip2mod-ncip:1.14.6-SNAPSHOT.23321024896076888128
mod-agreements2mod-agreements:7.1.0-SNAPSHOT.2372159214880000
mod-ebsconet2mod-ebsconet:2.3.0-SNAPSHOT.802124810240700128256
mod-organizations2mod-organizations:2.0.0-SNAPSHOT.9521024896070088128
mod-consortia2mod-consortia:1.2.0-SNAPSHOT.2225136477604416384512
edge-sip22edge-sip2:3.3.0-SNAPSHOT.26421024896076888128
mod-serials-management2mod-serials-management:1.1.0-SNAPSHOT.4622480231201792384512
mod-settings2mod-settings:1.0.4-SNAPSHOT.6721024896076888128
mod-data-import2mod-data-import:3.2.0-SNAPSHOT.18912048184401292384512
mod-search6mod-search:4.0.0-SNAPSHOT.278225922480014405121024
edge-dematic2edge-dematic:2.3.0-SNAPSHOT.14311024896076888128
mod-inn-reach2mod-inn-reach:3.2.1-SNAPSHOT.102236003240028805121024
mod-record-specifications2mod-record-specifications:1.0.0-SNAPSHOT.421024896076888128
mod-tags2mod-tags:2.2.1-SNAPSHOT.13821024896076888128
mod-authtoken3mod-authtoken:2.16.0-SNAPSHOT.303214401152092288128
edge-courses2edge-courses:1.5.0-SNAPSHOT.116021024896076888128
mod-notify2mod-notify:3.2.1-SNAPSHOT.26821024896076888128
mod-inventory-update2mod-inventory-update:3.4.2-SNAPSHOT.10021024896076888128
mod-configuration2mod-configuration:5.11.0-SNAPSHOT.35521024896076888128
mod-orders-storage2mod-orders-storage:13.8.0-SNAPSHOT.245210248960700