Table of Contents outline true
...
Test № | status | total_num_of_records | mapped_num_of_records | saved_num_of_records | percentage of Unsaved Records |
---|---|---|---|---|---|
Test №1 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11983692 | 0.69% |
Test №2 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11959281 | 0.89% |
Test №3 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11921442 | 1.21% |
Test №4 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11811976 | 2.12% |
Test №5 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11815927 | 2.08% |
Test №6 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11672697 | 3.27% |
Test №7 | DATA_SAVING_FAILED | 12067250 | 12067250 | 11421743 | 5.35% |
Recommendations and Jiras
- Increase default CPU allocation for mod-entities-links service or set it to 0.
- Use CHUNK_FETCH_IDS_COUNT=12000 and RECORDS_CHUNK_SIZE=4000 to decrease migration time, but mod-entities-links will use 25% more CPU.
- Use only 1 container (task) for mod-marc-migrations.
- While the data mapping is running, files with data will be stored directly in the working mod-marc-migrations container. Further, all files will be deleted from the container and relocated to the S3 bucket (if the S3 bucket is not provided - data mapping fails).
If the container falls during the data mapping process - all files will be lost and data mapping will hang forever.
Test Results
This table contains duration time for Migrated and saved Marc Authority records.
...
Here we can see that mod-entities-links module had spikes up to 90% Instances CPU power and mod-marc-migrations module used 20% Instances CPU power.
...
Here we can see that mod-entities-links had spikes up to 90% memory.
Kafka metrics
OpenSearch Data Nodes metrics
...
PTF - Baseline RCON environment configuration
- 10 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs db.r6g.xlarge
32 GB 4 vCPUs - Open Search ptf-test
- Data nodes
- Instance type - r6g.2xlarge.search
- Number of nodes - 4
- Version: OpenSearch_2_7_R20240502
- Dedicated master nodes
- Instance type - r6g.large.search
- Number of nodes - 3
- Data nodes
- MSK fse-tenant
- 2 brokers, kafka.m7g.xlarge brokers in 2 zones
Apache Kafka version 3.7.x
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...