Table of Contents outline true
...
- During the tests, we collected the
mapping_duration
andsaving_duration
for the central tenant, along with thetotal_saving_duration
for all member tenants, specifically during the final test. When the saving process started for the central tenant, it triggered the update and saving processes asynchronously for all member tenants. This behavior is specific to the ECS environment. - The saving process encountered the status
DATA_SAVING_FAILED
, and not all records were updated. This issue occurred because the central tenant contained record IDs that were not present in the member tenants. Percentage of Unsaved Records for Test №1 was 0.69% but for last Test №7 it was 5.35%, this issue should be investigated. - We gather baseline performance metrics for the marc-migration process across the central tenant. However, our recommendation is to collect results for both the central and member tenants, and separate metrics for each individual member tenant.
Recommendations and Jiras
- Repeat tests to collect results for both the central and member tenants.
- Run tests to collect separate metrics for each individual member tenant.
- Fix test data set to avoid issue that central tenant contained record IDs which not present in the member tenants.
Test № | status | total_num_of_records | mapped_num_of_records | saved_num_of_records | percentage of Unsaved Records |
---|---|---|---|---|---|
Test №1 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11983692 | 0.69% |
Test №2 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11959281 | 0.89% |
Test №3 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11921442 | 1.21% |
Test №4 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11811976 | 2.12% |
Test №5 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11815927 | 2.08% |
Test №6 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11672697 | 3.27% |
Test №7 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11421743 | 5.35% |
...
Here we can see that mod-entities-links module had spikes up to 90% Instances CPU power and mod-marc-migrations module used 20% Instances CPU power.
...
Here we can see that mod-entities-links had spikes up to 90% memory.
Kafka metrics
OpenSearch Data Nodes metrics
...
PTF - Baseline RCON environment configuration
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs db.r6g.xlarge
32 GB 4 vCPUs - Open Search ptf-test
- Data nodes
- Instance type - r6g.2xlarge.search
- Number of nodes - 4
- Version: OpenSearch_2_7_R20240502
- Dedicated master nodes
- Instance type - r6g.large.search
- Number of nodes - 3
- Data nodes
- MSK fse-tenant
- 2 brokers, kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...