PTF - Migrate/Update large number of Marc Authority records (Ransoms - ECS)
Overview
- This document contains the results of testing marc-migration for over 12 million records on Ramsons ECS environment. In the Ramsons release, Spitfire created mod-marc-migrations to separate out the data migration process from upgrading the module. In this test, we'll analyze the performance of the newly created module, mod-marc-migrations, with respect to migrating 12 million Authority records and collect Baseline performance measurements for the marc-migration process across central tenant.
- PERF-1005Getting issue details... STATUS
Summary
- During the tests, we collected the
mapping_duration
andsaving_duration
for the central tenant, along with thetotal_saving_duration
for all member tenants, specifically during the final test. When the saving process started for the central tenant, it triggered the update and saving processes asynchronously for all member tenants. This behavior is specific to the ECS environment. - The saving process encountered the status
DATA_SAVING_FAILED
, and not all records were updated. This issue occurred because the central tenant contained record IDs that were not present in the member tenants. Percentage of Unsaved Records for Test №1 was 0.69% but for last Test №7 it was 5.35%, this issue should be investigated. - We gather baseline performance metrics for the marc-migration process across the central tenant. However, our recommendation is to collect results for both the central and member tenants, and separate metrics for each individual member tenant.
Recommendations and Jiras
- Repeat tests to collect results for both the central and member tenants.
- Run tests to collect separate metrics for each individual member tenant.
- Fix test data set to avoid issue that central tenant contained record IDs which not present in the member tenants.
Test № | status | total_num_of_records | mapped_num_of_records | saved_num_of_records | percentage of Unsaved Records |
---|---|---|---|---|---|
Test №1 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11983692 | 0.69% |
Test №2 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11959281 | 0.89% |
Test №3 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11921442 | 1.21% |
Test №4 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11811976 | 2.12% |
Test №5 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11815927 | 2.08% |
Test №6 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11672697 | 3.27% |
Test №7 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11421743 | 5.35% |
Test Results
This table contains duration time for Migrated and saved Marc Authority records.
Test № | CHUNK_FETCH_IDS_COUNT | RECORDS_CHUNK_SIZE | mapping_duration Central Tenant | saving_duration Central Tenant | total_saving_duration Central and Member Tenants |
---|---|---|---|---|---|
Test №1 | 500 | 500 | 3:01:49 | 1:18:09 | |
Test №2 | 2000 | 1000 | 2:22:24 | 1:06:29 | |
Test №3 | 4000 | 2000 | 2:07:42 | 0:50:57 | |
Test №4 | 5000 | 2500 | 2:00:05 | 1:26:05 | |
Test №5 | 7000 | 3500 | 2:01:27 | 0:52:20 | |
Test №6 | 10000 | 5000 | 2:06:32 | 0:50:41 | |
Test №7 | 12000 | 4000 | 2:21:13 | 0:54:05 | 2:04:41 |
*Total saving duration time for Central and Member Tenants have to collected after each test run, results for Test №7 were collected only from automatic migrations triggered from the central tenant. Separate tests for each Member Tenants was not run.
Test №1-2-3-4-5-6-7
Introduction: The Baseline RCON Environment configuration was applied, and CPU=0 was set for all modules.
Objective: The objective of these tests was to collect performance measurements for the marc-migration process across central tenants.
Results: Results were collect for central tenant and only for last test for member tenants.
Instance CPU Utilization
Service CPU Utilization
Here we can see that mod-entities-links module had spikes up to 90% Instances CPU power and mod-marc-migrations module used 20% Instances CPU power.
Service Memory Utilization
Here we can see that mod-entities-links had spikes up to 90% memory.
Kafka metrics
OpenSearch Data Nodes metrics
DB CPU Utilization
DB CPU had spikes up to 99%
DB Connections
Max number of DB connections was 1250.
DB load
Top SQL-queries
Appendix
Infrastructure
PTF - Baseline RCON environment configuration
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs db.r6g.xlarge
32 GB 4 vCPUs - Open Search ptf-test
- Data nodes
- Instance type - r6g.2xlarge.search
- Number of nodes - 4
- Version: OpenSearch_2_7_R20240502
- Dedicated master nodes
- Instance type - r6g.large.search
- Number of nodes - 3
- Data nodes
- MSK fse-tenant
- 2 brokers, kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3