In Progress(in review) + retesting results will be add to these report in scope of the Jira Legacyserver System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-681
server | System Jira |
---|---|
serverId | 01505d01-b853-3c2e-90f1-ee9b165564fc |
key | PERF-681 |
...
- One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748 - During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930 - UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-929 - Usage:
- Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations. CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
- When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
- Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk.
- More CPU could be allocated to mod-inventory and mod-di-converter-storage
...
** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930
...
With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled
ocp3-mod-data-import:12
Data Import Robustness Enhancement
...
Memory utilization rich maximal value for mod-source-record-storage-b 88% and for mod-source-record-manager-b 85%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
RDS CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal CPU Utilization = 95%
...
The average number of DB connections during the test changed from 400 to 1200.
CICO responce time graph
Retesting DI file-splitting feature on Poppy release with Refresh Token Rotation (RTR) and file-splitting feature
The goal of the tests was to investigate how the file-splitting feature caused Data-import on Poppy release and the impact of Refresh Token Rotation (RTR). The tests were performed on ocp3(Poppy), pcp1(Poppy) and ncp5(Orchid) environments.
Refresh Token Rotation (RTR) Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-723
Brief comparison summary
Test results
DI tests/ Configuration | ncp5 Orchid | ocp3 FSF true without RTR token | ocp3 FSF false without RTR token | *ocp3 FSF deleted without token | ocp3 FSF false AT =RT= 300; | ocp3 FSF false AT =RT= 1000000000 | pcp1 FSF false AT =RT= 10000000 | pcp1 FSF false without token retest* |
---|---|---|---|---|---|---|---|---|
250k_bib_Create_1.mrc | not tested | not tested | failed | failed | failed | failed | failed | failed |
100k_bib_Create.mrc | 00:41:41 | 00:54:32 | 00:54:36 | 00:53:59 | 00:48:56 | 00:54:42.05 | 00:47:17 | "01:01:39" |
50k_bib_Create.mrc | 00:19:43 | 00:30:40 | 00:30:20 | 00:22:17 | 00:27:05 | 00:30:09 | 00:21:45 | 00:20:46 |
25k_bib_Create.mrc | 00:10:11 | 00:13:53 | 00:17:20 | 00:10:33 | 00:12:42 | 00:13:25 | 00:11:54 | 00:10:53 |
10k_bib_Create.mrc | 00:04:19 | 00:07:22 | 00:05:35 | 00:04:38 | not tested | 00:05:33. | 00:04:42 | 00:04:36 |
5k_bib_Create.mrc | 00:02:35 | 00:04:31 | 00:02:43 | 00:02:55 | not tested | 00:03:07 | 00:02:55 | 00:02:30 |
1k_bib_Create.mrc | not tested | not tested | not tested | not tested | not tested | not tested | 00:00:54 | not tested |
DI-25K-Update.mrc | not tested | not tested | finished successfully | failed | failed | finished successfully | failed | finished successfully |
Resource utilization during testing
Service CPU utilization during the Data-import process
The next data import jobs were carried out
1) 5k_bib_Create 2) 10k_bib_Create 3) 25k_bib_Create 4) 50k_bib_Create 5) 50k_bib_Create 6) 100k_bib_Create 7) 50k_bib_Create 8) 25k_bib_Create 9) 25k_bib_Update 10) 50k_bib_Update(stopped)
CPU utilization was stable during all jobs, but some spikes of data-import jobs were at the beginning of all tests.
Memory Utilization
Most of the modules were stable during the test, and no memory leak is suspected for DI modules, except mod-inventory-b which consumed about 92% of memory during all DI processes.
RDS CPU Utilization
Maximal CPU Utilization = 95%
RDS Database Connections
The maximal number of DB connections during the tests was about 580.
Database load
Top SQL queries
Appendix
Infrastructure ocp3 with the "Bugfest" Dataset
...
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = 620042011
- tenant2_mod_source_record_storage.marc_indexers = 0
- tenant3_mod_source_record_storage.marc_indexers = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
...