Overview
This document contains the results of testing Data Import Splitting Feature for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3. - PERF-644Getting issue details... STATUS - PERF-645Getting issue details... STATUS - PERF-647Getting issue details... STATUS - PERF-646Getting issue details... STATUS - PERF-671Getting issue details... STATUS
Splitting feature documentation Detailed Release Notes for Data Import Splitting Feature
Summary
- Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
- ---------Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order. Duration for Check-In/Check-Out is prolonged twice during DI.
- This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules.
- Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
- Approximately DB CPU usage is up to 95%.
Recommendations and Jiras
1) One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException. - MODDATAIMP-748Getting issue details... STATUS Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing.
2) During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. - MODDATAIMP-930Getting issue details... STATUS
3) UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen. - MODDATAIMP-929Getting issue details... STATUS
Results
Test # | Splitting Feature Enabled | Results | Splitting Feature Disabled | Results | Before Splitting Feature released | Results | ||
---|---|---|---|---|---|---|---|---|
1 | 100K MARC Create | PTF - Create 2 | 37 min -39 min | Completed | 40 min | Completed | 32-33 minutes | Completed |
1 | 250K MARC Create | PTF - Create 2 | 1 hour 32 min | Completed | 1 hour 41 min | Completed | 1 hour 33 min - 1 hour 57 min | Completed |
1 | 500K MARC Create | PTF - Create 2 | 3 hours 29 min | Completed* | 3 hours 55 min | Completed | 3 hours 33 min | Completed |
2 | Multitenant MARC Create (100k, 50k, and 1 record) | PTF - Create 2 | 2 hours 40 min | Completed* | 3 hours 1 min | Completed | ||
3 | CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants) | PTF - Create 2 | 24 min | Completed * | ||||
4 | 100K MARC Update (Create new file) | PTF - Updates Success - 1 | 58 min 25 sec 57 min 19 sec | Completed | 1 hour 3 min | Completed | - | - |
4 | 250K MARC Update | PTF - Updates Success - 1 | 2 hours 2 min ** 2 hours 12 min | Completed with errors ** Completed | 1 hour 53 min | Completed | - | - |
4 | 500K MARC Update | PTF - Updates Success - 1 | 4 hours 43 min 4 hours 38 minutes | Completed Completed | 5 hour 59 min | Completed | - | - |
* - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException. - MODDATAIMP-748Getting issue details... STATUS Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing.
** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. - MODDATAIMP-930Getting issue details... STATUS
Data Import Robustness Enhancement - PERF-646Getting issue details... STATUS
25K records | RECORDS_PER_SPLIT_FILE | ||||||||
Number of concurrent tenants | Job profile | 500 | Status | 1K | Status | 5K | Status | 10K | Status |
---|---|---|---|---|---|---|---|---|---|
1 Tenant test#1 | PTF - Create 2 | 12 minutes 55 seconds | Completed | 11 minutes 48 seconds | Completed | 09 minutes 21 seconds | Completed | 9 minutes 2 sec | Completed |
1 Tenant test#2 | 10 minutes 31 seconds | Completed | 09 minutes 32 seconds | Completed | 9 minutes 6 sec | Completed | 9 minutes 14 sec | Completed | |
2 Tenants test#1 | PTF - Create 2 | 19 minutes 29 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 15 seconds | Completed | 16 minutes 3 seconds | Completed |
2 Tenants test#2 | 18 minutes 19 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 11 sec | Completed | 16 min 41 sec | Completed | |
3 Tenants test#1 | PTF - Create 2 | 24 minutes 15 seconds | Completed | 25 minutes 47 seconds | Completed | 23 minutes | Completed | 23 minutes 27 seconds | Completed |
3 Tenants test#2 | 24 minutes 38 seconds | Completed | 23 minutes 28 seconds | Completed | 23 minutes 2 sec | Completed | 23 minutes 26 seconds | Completed |
Instance CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. The maximal CPU Utilization value is 38%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
Memory Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
Most of the modules were stable during the test, and no memory leak is suspected for DI modules, only 2 modules increased memory consumption usage after the beginning of the tests
Memory utilization rich maximal value for mod-source-record-storage-b 88% and for mod-source-record-manager-b 85%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
Service CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
RDS CPU Utilization
MARC BIB CREATE
Approximately DB CPU usage is up to 95%
Test#1 500k records DI
Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants
Test#3 | Duration with DI | Duration without DI |
---|---|---|
Check-In | 1.138 | 0.517 |
Check-Out | 1.552 | 0.796 |
Test#3 | DI Duration with CI/CO | DI Duration without CI/CO* |
---|---|---|
Tenant _1 | 20 min | 14 min (18 min for run 2) |
Tenant _2 | 19 min | 16 min (18 min for run 2) |
Tenant _3 | 16 min | 16 min (15 min for run 2) |
* - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k
Memory Utilization
This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for DI modules.
MARC BIB CREATE
Test#1 100k, 250k, 500k records DI
Test#2 Multitenant DI (9 concurrent jobs)
Test#3 With CI/CO
Service CPU Utilization
MARC BIB CREATE
Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
Test#1 500k records DI
Test#2 Multitenant
Test#3 With CI/CO
Instance CPU Utilization
Test#1 500k records DI
Test#2 Multitenant DI (9 concurrent jobs)
RDS CPU Utilization
MARC BIB CREATE
Approximately DB CPU usage is up to 95%
Test#1 500k records DI
Test#2 Multitenant DI (9 concurrent jobs)
Maximal DB CPU usage is about 95%
Test#3 With CI/CO
RDS Database Connections
MARC BIB CREATE
For DI job Create- 535 connections count.
Test#1 500k records DI
Test#2 Multitenant
Test#3 With CI/CO
Appendix
Infrastructure ocp3
Records count :
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = 620042011
- tenant2_mod_source_record_storage.marc_indexers = 0
- tenant3_mod_source_record_storage.marc_indexers = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
Before Splitting Feature released
Module ocp3-pvt Mon Sep 11 09:33:28 UTC 2023 | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
---|---|---|---|---|---|---|---|---|---|---|
mod-remote-storage | 13 | mod-remote-storage:2.0.3 | 2 | 4920 | 4472 | 1024 | 3960 | 512 | 512 | false |
mod-agreements | 8 | mod-agreements:5.5.2 | 2 | 1592 | 1488 | 128 | 968 | 384 | 512 | false |
mod-data-import | 7 | mod-data-import:2.7.1 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | false |
mod-search | 30 | mod-search:2.0.1 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | false |
mod-authtoken | 7 | mod-authtoken:2.13.0 | 2 | 1440 | 1152 | 512 | 922 | 88 | 128 | false |
mod-configuration | 7 | mod-configuration:5.9.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-inventory-storage | 1 | mod-inventory-storage:26.1.0-SNAPSHOT.665 | 0 | 2208 | 1952 | 1024 | 1440 | 384 | 512 | false |
mod-circulation-storage | 15 | mod-circulation-storage:16.0.1 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-source-record-storage | 11 | mod-source-record-storage:5.6.7 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false |
mod-calendar | 7 | mod-calendar:2.4.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-inventory | 12 | mod-inventory:20.0.6 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | false |
mod-circulation | 9 | mod-circulation:23.5.6 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-di-converter-storage | 8 | mod-di-converter-storage:2.0.5 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-pubsub | 8 | mod-pubsub:2.9.1 | 2 | 1536 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-users | 8 | mod-users:19.1.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-patron-blocks | 8 | mod-patron-blocks:1.8.0 | 2 | 1024 | 896 | 1024 | 768 | 88 | 128 | false |
mod-source-record-manager | 9 | mod-source-record-manager:3.6.4 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false |
nginx-edge | 7 | nginx-edge:2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
mod-quick-marc | 7 | mod-quick-marc:3.0.0 | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | false |
nginx-okapi | 7 | nginx-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
okapi-b | 8 | okapi:5.0.1 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-feesfines | 7 | mod-feesfines:18.2.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-patron | 7 | mod-patron:5.5.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-notes | 7 | mod-notes:5.0.1 | 2 | 1024 | 896 | 128 | 952 | 384 | 512 | false |
pub-okapi | 7 | pub-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | false |
Service versions for Splitting Feature test
Module ocp3-pvt Mon Sep 25 12:43:06 UTC 2023 | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
---|---|---|---|---|---|---|---|---|---|---|
mod-data-import | 10 | mod-data-import:2.7.2-SNAPSHOT.137 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | false |
mod-search | 30 | mod-search:2.0.1 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | false |
mod-configuration | 8 | mod-configuration:5.9.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-bulk-operations | 7 | mod-bulk-operations:1.0.6 | 2 | 3072 | 2600 | 1024 | 1536 | 384 | 512 | false |
mod-inventory-storage | 1 | mod-inventory-storage:26.1.0-SNAPSHOT.665 | 0 | 2208 | 1952 | 1024 | 1440 | 384 | 512 | false |
mod-circulation-storage | 15 | mod-circulation-storage:16.0.1 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-source-record-storage | 12 | mod-source-record-storage:5.6.7 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false |
mod-calendar | 7 | mod-calendar:2.4.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-inventory | 12 | mod-inventory:20.0.6 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | false |
mod-circulation | 9 | mod-circulation:23.5.6 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-di-converter-storage | 8 | mod-di-converter-storage:2.0.5 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-pubsub | 9 | mod-pubsub:2.9.1 | 2 | 1536 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-users | 9 | mod-users:19.1.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-patron-blocks | 9 | mod-patron-blocks:1.8.0 | 2 | 1024 | 896 | 1024 | 768 | 88 | 128 | false |
mod-source-record-manager | 12 | mod-source-record-manager:3.6.5-SNAPSHOT.245 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false |
mod-quick-marc | 7 | mod-quick-marc:3.0.0 | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | false |
nginx-okapi | 7 | nginx-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
okapi-b | 8 | okapi:5.0.1 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-feesfines | 8 | mod-feesfines:18.2.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-notes | 7 | mod-notes:5.0.1 | 2 | 1024 | 896 | 128 | 952 | 384 | 512 | false |
pub-okapi | 7 | pub-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | false |
Methodology/Approach
To set splitting feature: Detailed Release Notes for Data Import Splitting Feature
Test 1: Manually tested 100k, 250k, and 500k records files started one by one on one tenant only.
Test 2: Manually tested 100k+50k+1 record files DI started simultaneously on every 3 tenants (9 jobs total).
Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users, DI file size: 25k