In Progress
Table of Contents |
---|
Overview
...
In Progress
Table of Contents |
---|
Overview
The Data Import Task Force (DITF) implements a feature that splits large input MARC files into smaller ones, resulting in smaller jobs, so that the big files could be imported and be imported consistently. This document contains the results of performance tests on the feature and also an analysis the feature's performance with respect to the baseline tests. The following Jiras were implemented.
...
- One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748 - During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930 - UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-929 - Usage:
- Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.
- When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
- Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk.
- More CPU could be allocated to mod-inventory and mod-di-converter-storage
...
** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930
...
With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled
ocp3-mod-data-import:12
Data Import Robustness Enhancement Jira Legacyserver System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-646
server | System Jira |
---|---|
serverId | 01505d01-b853-3c2e-90f1-ee9b165564fc |
key | PERF-646 |
25K records | RECORDS_PER_SPLIT_FILE | ||||||||||
Number of concurrent tenants | Job profile | 500 | Status | 1K | Status | 5K | Status | 10K | Status | Test with Split disabled | Status |
---|---|---|---|---|---|---|---|---|---|---|---|
1 Tenant test#1 | PTF - Create 2 | 12 minutes 55 seconds | Completed | 11 minutes 48 seconds | Completed | 09 minutes 21 seconds | Completed | 9 minutes 2 sec | Completed | 10 min 35 sec | Completed |
1 Tenant test#2 | 10 minutes 31 seconds | Completed | 09 minutes 32 seconds | Completed | 9 minutes 6 sec | Completed | 9 minutes 14 sec | Completed | 11 min 27 sec | Completed | |
2 Tenants test#1 | PTF - Create 2 | 19 minutes 29 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 15 seconds | Completed | 16 minutes 3 seconds | Completed | 19 min 18 sec | Completed |
2 Tenants test#2 | 18 minutes 19 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 11 sec | Completed | 16 min 41 sec | Completed | 20 min 33 sec | Completed | |
3 Tenants test#1 | PTF - Create 2 | 24 minutes 15 seconds | Completed | 25 minutes 47 seconds | Completed | 23 minutes | Completed | 23 minutes 27 seconds | Completed | 30 min 2 sec | Completed |
3 Tenants test#2 | 24 minutes 38 seconds | Completed | 23 minutes 28 seconds | Completed | 23 minutes 2 sec | Completed | 23 minutes 26 seconds | Completed | T1 - "00:33:35.1" Error T2 - "01:23:36.144" T3 - "01:16:26.391" | Completed with error and long proccesing* |
* on the first tenantproccesing stoped wit error " LOGS in progress "
it caused the spike of CPU utilization on Kafka (tenant cluster) up to 94%
Instance CPU Utilization
...
Memory utilization rich maximal value for mod-source-record-storage-b 88% and for mod-source-record-manager-b 85%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
RDS CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal CPU Utilization = 95%
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal CPU Utilization = 94%
RDS Database Connections
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
...
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = 620042011
- tenant2_mod_source_record_storage.marc_indexers = 0
- tenant3_mod_source_record_storage.marc_indexers = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
...