In Progress
...
- One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748 - During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930 - UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-929 - Usage:
- Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations.
- When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
- Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk.
- More CPU could be allocated to mod-inventory and mod-di-converter-storage
Results
Test # | Splitting Feature Enabled | Results | Splitting Feature Disabled | Results | Before Splitting Feature |
---|
Deployed | Results | |||||||
---|---|---|---|---|---|---|---|---|
1 | 100K MARC Create | PTF - Create 2 | 37 min -39 min | Completed | 40 min | Completed | 32-33 minutes | Completed |
1 | 250K MARC Create | PTF - Create 2 | 1 hour 32 min | Completed | 1 hour 41 min | Completed | 1 hour 33 min - 1 hour 57 min | Completed |
1 | 500K MARC Create | PTF - Create 2 | 3 hours 29 min | Completed* | 3 hours 55 min | Completed | 3 hours 33 min | Completed |
2 | Multitenant MARC Create (100k, 50k, and 1 record) | PTF - Create 2 | 2 hours 40 min | Completed* | 3 hours 1 min | Completed | ||
3 | CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants) | PTF - Create 2 | 24 min 18 sec | Completed | 24 min | Completed * | ||
4 | 100K MARC Update (Create new file) | PTF - Updates Success - 1 | 58 min 25 sec 57 min 19 sec | Completed | 1 hour 3 min | Completed | - | - |
4 | 250K MARC Update | PTF - Updates Success - 1 | 2 hours 2 min ** 2 hours 12 min | Completed with errors ** Completed | 1 hour 53 min | Completed | - | - |
4 | 500K MARC Update | PTF - Updates Success - 1 | 4 hours 43 min 4 hours 38 minutes | Completed Completed | 5 hour 59 min | Completed | - | - |
* - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing. Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748
...
** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930
...
Data Import Robustness Enhancement Jira Legacyserver System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-646
server | System Jira |
---|---|
serverId | 01505d01-b853-3c2e-90f1-ee9b165564fc |
key | PERF-646 |
25K records | RECORDS_PER_SPLIT_FILE | ||||||||
Number of concurrent tenants | Job profile | 500 | Status | 1K | Status | 5K | Status | 10K | Status |
---|---|---|---|---|---|---|---|---|---|
1 Tenant test#1 | PTF - Create 2 | 12 minutes 55 seconds | Completed | 11 minutes 48 seconds | Completed | 09 minutes 21 seconds | Completed | 9 minutes 2 sec | Completed |
1 Tenant test#2 | 10 minutes 31 seconds | Completed | 09 minutes 32 seconds | Completed | 9 minutes 6 sec | Completed | 9 minutes 14 sec | Completed | |
2 Tenants test#1 | PTF - Create 2 | 19 minutes 29 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 15 seconds | Completed | 16 minutes 3 seconds | Completed |
2 Tenants test#2 | 18 minutes 19 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 11 sec | Completed | 16 min 41 sec | Completed | |
3 Tenants test#1 | PTF - Create 2 | 24 minutes 15 seconds | Completed | 25 minutes 47 seconds | Completed | 23 minutes | Completed | 23 minutes 27 seconds | Completed |
3 Tenants test#2 | 24 minutes 38 seconds | Completed | 23 minutes 28 seconds | Completed | 23 minutes 2 sec | Completed | 23 minutes 26 seconds | Completed |
Instance CPU Utilization
...
Memory utilization rich maximal value for mod-source-record-storage-b 88% and for mod-source-record-manager-b 85%.
Test 285%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
Service CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K500, 2 runs for each test.
...
RDS CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
RDS
...
MARC BIB CREATE
Approximately DB CPU usage is up to 95%
Test#1 500k records DI
Database Connections
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants
...
Maximal DB CPU usage is about 95%
Test#3 With CI/CO
RDS Database Connections
MARC BIB CREATE
For DI job Create- 535 connections count.
...
Test#3 With CI/CO
Appendix
Infrastructure
...
ocp3 with the "Bugfest" Dataset
Records count :
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = 620042011
- tenant2_mod_source_record_storage.marc_indexers = 0
- tenant3_mod_source_record_storage.marc_indexers = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
...