Overview
- This document contains the results of testing marc-migration for over 12 million records on Ramsons ECS environment. In the Ramsons release, Spitfire created mod-marc-migrations to separate out the data migration process from upgrading the module. In this test, we'll analyze the performance of the newly created module, mod-marc-migrations, with respect to migrating 12 million Authority records.
- PERF-1005Getting issue details... STATUS
Summary
- During the tests, we collected the
mapping_duration
andsaving_duration
for the central tenant, along with thetotal_saving_duration
for all member tenants, specifically during the final test. When the saving process started for the central tenant, it triggered the update and saving processes asynchronously for all member tenants. This behavior is specific to the ECS environment. - The saving process encountered the status
DATA_SAVING_FAILED
, and not all records were updated. This issue occurred because the central tenant contained record IDs that were not present in the member tenants. Percentage of Unsaved Records for Test №1 was 0.69% but for last Test №7 it was 5.35%, this issue should be investigated. - We gather baseline performance metrics for the marc-migration process across the central tenant. However, our recommendation is to collect results for both the central and member tenants, and separate metrics for each individual member tenant.
Recommendations and Jiras
- Repeat tests to collect results for both the central and member tenants.
- Run tests to collect separate metrics for each individual member tenant.
Test № | status | total_num_of_records | mapped_num_of_records | saved_num_of_records | percentage of Unsaved Records |
---|---|---|---|---|---|
Test №1 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11983692 | 0.69% |
Test №2 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11959281 | 0.89% |
Test №3 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11921442 | 1.21% |
Test №4 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11811976 | 2.12% |
Test №5 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11815927 | 2.08% |
Test №6 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11672697 | 3.27% |
Test №7 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11421743 | 5.35% |
Test Results
This table contains duration time for Migrated and saved Marc Authority records.
Test № | CHUNK_FETCH_IDS_COUNT | RECORDS_CHUNK_SIZE | mapping_duration Central Tenant | saving_duration Central Tenant | total_saving_duration Central and Member Tenants |
---|---|---|---|---|---|
Test №1 | 500 | 500 | 3:01:49 | 1:18:09 | |
Test №2 | 2000 | 1000 | 2:22:24 | 1:06:29 | |
Test №3 | 4000 | 2000 | 2:07:42 | 0:50:57 | |
Test №4 | 5000 | 2500 | 2:00:05 | 1:26:05 | |
Test №5 | 7000 | 3500 | 2:01:27 | 0:52:20 | |
Test №6 | 10000 | 5000 | 2:06:32 | 0:50:41 | |
Test №7 | 12000 | 4000 | 2:21:13 | 0:54:05 | 2:04:41 |
*Total saving duration time for Central and Member Tenants have to collected after each test run, results for Test №7 were collected only from automatic migrations triggered from the central tenant. Separate tests for each Member Tenants was not run.
Test №1-2-3-4-5-6-7
Introduction: The Baseline RCON Environment configuration was applied, and CPU=0 was set for all modules.
Objective: The objective of these tests was to collect performance measurements for the marc-migration process across central tenants.
Results: Results were collect for central tenant and only for last test for member tenants.
Instance CPU Utilization
Service CPU Utilization
Here we can see that mod-entities-links module had spikes up to 90% Instances CPU power and mod-marc-migrations module used 20% Instances CPU power.
Service Memory Utilization
Here we can see that mod-entities-links had spikes up to 90% memory.
Kafka metrics
OpenSearch Data Nodes metrics
DB CPU Utilization
DB CPU had spikes up to 99%
DB Connections
Max number of DB connections was 1250.
DB load
Top SQL-queries
Appendix
Infrastructure
PTF - Baseline RCON environment configuration
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs db.r6g.xlarge
32 GB 4 vCPUs - Open Search ptf-test
- Data nodes
- Instance type - r6g.2xlarge.search
- Number of nodes - 4
- Version: OpenSearch_2_7_R20240502
- Dedicated master nodes
- Instance type - r6g.large.search
- Number of nodes - 3
- Data nodes
- MSK fse-tenant
- 2 brokers, kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Methodology/Approach
Baseline RCON Environment configuration
Test preparation:
- 12 million Marc Authority records present in database
Background 'Deleting old records from marc_indexers' jobs were disabled:
added configuration to mod-source-record-storage{ "name": "srs.marcIndexers.delete.interval.seconds", "value": "86400" }
Check Migration Operation for central tenant from the database:
select end_time_mapping - start_time_mapping as mapping_duration,end_time_saving - start_time_saving as saving_duration,* from cs00000int_mod_marc_migrations.operation order by start_time_mapping desc
- Check Migration Operation for member tenants from the database:
WITH date_range AS ( SELECT '10/18/2024 6:22:40 AM'::timestamp AS start_date, '10/18/2024 3:23:28 PM'::timestamp AS end_date ) SELECT AGE(MAX(updated_date), MIN(updated_date)) AS total_saving_duration, MIN(updated_date) AS start_total_saving_process, MAX(updated_date) AS end_total_saving_process FROM ( SELECT updated_date FROM cs00000int_mod_entities_links.authority, date_range WHERE updated_date >= date_range.start_date AND updated_date <= date_range.end_date UNION ALL SELECT updated_date FROM cs00000int_0001_mod_entities_links.authority, date_range WHERE updated_date >= date_range.start_date AND updated_date <= date_range.end_date UNION ALL SELECT updated_date FROM cs00000int_0002_mod_entities_links.authority, date_range WHERE updated_date >= date_range.start_date AND updated_date <= date_range.end_date UNION ALL SELECT updated_date FROM cs00000int_0003_mod_entities_links.authority, date_range WHERE updated_date >= date_range.start_date AND updated_date <= date_range.end_date UNION ALL SELECT updated_date FROM cs00000int_0004_mod_entities_links.authority, date_range WHERE updated_date >= date_range.start_date AND updated_date <= date_range.end_date UNION ALL SELECT updated_date FROM cs00000int_0005_mod_entities_links.authority, date_range WHERE updated_date >= date_range.start_date AND updated_date <= date_range.end_date ) AS combined;
Test execution:
- Marc-Migration tests were initiated by a JMeter script from a local machine with the following configurations for the mod-marc-migrations service. The migration of Marc Authority records was started for the central tenant, and automatic migrations were triggered for the member tenants.
Test № | CHUNK_FETCH_IDS_COUNT | RECORDS_CHUNK_SIZE |
---|---|---|
Test №1 | 500 | 500 |
Test №2 | 2000 | 1000 |
Test №3 | 4000 | 2000 |
Test №4 | 5000 | 2500 |
Test №5 | 7000 | 3500 |
Test №6 | 10000 | 5000 |
Test №7 | 12000 | 4000 |