PTF - Migrate/Update large number of Marc Authority records (Ramsoms - ECS)

PTF - Migrate/Update large number of Marc Authority records (Ramsoms - ECS)

Overview

  • This document contains the results of testing marc-migration for over 12 million records on Ramsons ECS environment. In the Ramsons release, Spitfire created mod-marc-migrations to separate out the data migration process from upgrading the module. In this test, we'll analyze the performance of the newly created module, mod-marc-migrations, with respect to migrating 12 million Authority records and collect Baseline performance measurements for the marc-migration process across central tenant. 

PERF-1005: Performance Test (MODMARCMIG): Migrate/Update large number of Marc Authority records on ECS envClosed 

Summary

  • During the tests, we collected the mapping_duration and saving_duration for the central tenant, along with the total_saving_duration for all member tenants, specifically during the final test. When the saving process started for the central tenant, it triggered the update and saving processes asynchronously for all member tenants. This behavior is specific to the ECS environment.

  • The saving process encountered the status DATA_SAVING_FAILED, and not all records were updated. This issue occurred because the central tenant contained record IDs that were not present in the member tenants. Percentage of Unsaved Records for Test №1 was 0.69% but for last Test №7 it was 5.35%, this issue should be investigated. 

  • We gather baseline performance metrics for the marc-migration process across the central tenant. However, our recommendation is to collect results for both the central and member tenants, and separate metrics for each individual member tenant.

Recommendations and Jiras

  • Repeat tests to collect results for both the central and member tenants.

  • Run tests to collect separate metrics for each individual member tenant.

  • Fix test data set to avoid issue that central tenant contained record IDs which not present in the member tenants.

 

Test №

status

total_num_of_records

mapped_num_of_records

saved_num_of_records

percentage of Unsaved Records

Test №1

DATA_SAVING_FAILED

12067250

12067250

11983692

0.69%

Test №2

DATA_SAVING_FAILED

12067250

12067250

11959281

0.89%

Test №3

DATA_SAVING_FAILED

12067250

12067250

11921442

1.21%

Test №4

DATA_SAVING_FAILED

12067250

12067250

11811976

2.12%

Test №5

DATA_SAVING_FAILED

12067250

12067250

11815927

2.08%

Test №6

DATA_SAVING_FAILED

12067250

12067250

11672697

3.27%

Test №7

DATA_SAVING_FAILED

12067250

12067250

11421743

5.35%

 

Test Results

This table contains duration time for Migrated and saved Marc Authority records

Test №

CHUNK_FETCH_IDS_COUNT

RECORDS_CHUNK_SIZE

mapping_duration
Central Tenant

saving_duration
Central Tenant

total_saving_duration
Central and Member Tenants

Test №

CHUNK_FETCH_IDS_COUNT

RECORDS_CHUNK_SIZE

mapping_duration
Central Tenant

saving_duration
Central Tenant

total_saving_duration
Central and Member Tenants

Test №1

500

500

3:01:49

1:18:09

 

Test №2

2000

1000

2:22:24

1:06:29

 

Test №3

4000

2000

2:07:42

0:50:57

 

Test №4

5000

2500

2:00:05

1:26:05

 

Test №5

7000

3500

2:01:27

0:52:20

 

Test №6

10000

5000

2:06:32

0:50:41

 

Test №7

12000

4000

2:21:13

0:54:05

2:04:41

*Total saving duration time for Central and Member Tenants have to collected after each test run, results for Test №7 were collected only from automatic migrations triggered from the central tenant. Separate tests for each Member Tenants was not run.

Test №1-2-3-4-5-6-7

Introduction: The Baseline RCON Environment configuration was applied, and CPU=0 was set for all modules.

Objective: The objective of these tests was to collect performance measurements for the marc-migration process across central tenants.

Results: Results were collect for central tenant and only for last test for member tenants. 

Instance CPU Utilization

Service CPU Utilization

Here we can see that mod-entities-links  module had spikes up to 90% Instances CPU power and mod-marc-migrations module used 20% Instances CPU power.

Service Memory Utilization

Here we can see that mod-entities-links had spikes up to 90% memory.

 

Kafka metrics

OpenSearch Data Nodes metrics

DB CPU Utilization

DB CPU had spikes up to 99%

DB Connections

Max number of DB connections was 1250.

DB load

Top SQL-queries

 

Appendix

Infrastructure

PTF - Baseline RCON environment configuration

  • 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1

  • 1 database  instance, writer

  • Open Search ptf-test 

    • Data nodes

      • Instance type - r6g.2xlarge.search

      • Number of nodes - 4

      • Version: OpenSearch_2_7_R20240502

    • Dedicated master nodes

      • Instance type - r6g.large.search

      • Number of nodes - 3

  • MSK fse-tenant

    • brokers, kafka.m7g.xlarge brokers in 2 zones

    • Apache Kafka version 3.7.x 

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

 

Cluster Resources - rcon-pvt
Fri Oct 18 05:45:21 UTC 2024

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

Module

Task Definition Revision

Module Version

Task Count

Mem Hard Limit

Mem Soft Limit

CPU Units

Xmx

Metaspace Size

Max Metaspace Size

mod-remote-storage

2

mod-remote-storage:3.2.1-SNAPSHOT.171

2

4920

4472

0

3960

512

512

mod-finance-storage

2

mod-finance-storage:8.7.0-SNAPSHOT.183

2

1024

896

0

700

88

128

mod-ncip

2

mod-ncip:1.14.6-SNAPSHOT.233

2

1024

896

0

768

88

128

mod-agreements

2

mod-agreements:7.1.0-SNAPSHOT.237

2

1592

1488

0

0

0

0

mod-ebsconet

2

mod-ebsconet:2.3.0-SNAPSHOT.80

2

1248

1024

0

700

128

256

mod-organizations

2

mod-organizations:2.0.0-SNAPSHOT.95

2

1024

896

0

700

88

128

mod-consortia

2

mod-consortia:1.2.0-SNAPSHOT.22

2

5136

4776

0

4416

384

512

edge-sip2

2

edge-sip2:3.3.0-SNAPSHOT.264

2

1024

896

0

768

88

128

mod-serials-management

2

mod-serials-management:1.1.0-SNAPSHOT.46

2

2480

2312

0

1792

384

512

mod-settings

2

mod-settings:1.0.4-SNAPSHOT.67

2

1024

896

0

768

88

128

mod-data-import

2

mod-data-import:3.2.0-SNAPSHOT.189

1

2048

1844

0

1292

384

512

mod-search

6

mod-search:4.0.0-SNAPSHOT.278

2

2592

2480

0

1440

512

1024

edge-dematic

2

edge-dematic:2.3.0-SNAPSHOT.143

1

1024

896

0