...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Table of Contents |
---|
Overview
The Data Import Task Force (DITF) implements a feature that splits large input MARC files into smaller ones, resulting in smaller jobs, so that the big files could be imported and be imported consistently. This document contains the 1. Test with 1, 2, and 3 tenants' concurrent jobs with configurations the results of performance tests on the feature and also an analysis the feature's performance with respect to the baseline tests. The following Jiras were implemented.Â
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Summary
- The file-splitting feature is stable and offers more robustness to Data Import jobs even with the current infrastructure configuration. If there were failures, it's easier now to find the exact failed records to take actions on them.Â
- No stuck jobs in all tests performed.
- There were errors (see below) in some partial jobs, but they still completed so the entire job status is "Completed with error".
- Both of kinds of imports, create and update MARC BIBs worked well with this file-splitting feature enabled and also disabled.Â
- (At this point) There is no performance degradations, jobs not getting slower, on single-tenant imports. On multi-tenants imports, performance is be a little better
- Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
- Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order.Â
- No memory leak is suspected for all of the modules.
- Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%. Big improvement over the previous version (without file-splitting) for 500K imports where mod-di-converter-storage's CPU utilization was 462% and other modules were above 100% and up to 150%.Â
- Approximately DB CPU usage is up to 95%.
Recommendations and Jiras
- One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748 - During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930 - UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-929 - Usage:
- Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations. Â CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
- When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
- Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk.Â
- More CPU could be allocated to mod-inventory and mod-di-converter-storage
Results
Test # | Splitting Feature Enabled | Results | Splitting Feature Disabled | Results | Before Splitting Feature Deployed | Results | ||
---|---|---|---|---|---|---|---|---|
1 | 100K MARC BIB Create | PTF - Create 2 | 37 min -39 min | Completed | 40 min | Completed | 32-33 minutes | Completed |
1 | 250K MARC BIB Create | PTF - Create 2 | 1 hour 32 min | Completed | 1 hour 41 min | Completed | 1 hour 33 min - 1 hour 57 min | Completed |
1 | 500K MARC BIB Create | PTF - Create 2 | 3 hours 29 min | Completed* | 3 hours 55 min | Completed | 3 hours 33 min | Completed |
2 | Multitenant MARC Create (100k, 50k, and 1 record) | PTF - Create 2 | 2 hours 40 min | Completed* | 2 hours 43 min | Completed* | 3 hours 1 min | Completed |
3 | CI/CO + DI MARC BIB Create (20 users CI/CO, 25k records DI on 3 tenants) | PTF - Create 2 | 24 min 18 sec | Completed | 31 min 31 sec | Completed | 24 min | Completed * |
4 | 100K MARC BIB Update (Create new file) | PTF - Updates Success - 1 | 58 min 25 sec 57 min 19 sec | Completed | 1 hour 3 min | Completed | - | - |
4 | 250K MARC BIB Update | PTF - Updates Success - 1 | 2 hours 2 min ** 2 hours 12 min | Completed with errors ** Completed | 1 hour 53 min | Completed | - | - |
4 | 500K MARC BIB Update | PTF - Updates Success - 1 | 4 hours 43 min 4 hours 38 minutes | Completed Completed | 5 hour 59 min | Completed | - | - |
 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
 Reproduces in both cases with and without splitting features in at least 30% of test runs with 500k record files and multitenant testing. Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748
...
 ** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930
Test 1,2. 100k, 250K, 500k and Multitenant MARC BIB Create
Memory Utilization
This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for DI modules.
...
Test#2 Multitenant DI (9 concurrent jobs)
Service CPU UtilizationÂ
MARC BIB CREATE
Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
Test#1 500k records DI
Test#2 Multitenant
Instance CPU Utilization
Test#1 500k records DI
Test#2 Multitenant DI (9 concurrent jobs)
RDS CPU UtilizationÂ
MARC BIB CREATE
Approximately DB CPU usage is up to 95%
...
Maximal DB CPU usage is about 95%
RDS Database Connections
MARC BIB CREATE
 For DI job Create- 535 connections count.
Test#1Â 500k records DI
Test#2 Multitenant
Test 3 With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Enabled &Â
Splitting Feature Disabled
Response time |
---|
without DI |
---|
Before Splitting Feature |
---|
Deployed | Response time with DI Before Splitting Feature Deployed | Response time without |
---|
DI |
---|
Splitting Feature |
---|
disabled | Response time |
---|
with DIÂ Splitting Feature |
---|
disabled | Response time |
---|
without DIÂ Splitting Feature |
---|
enabled | Response time |
---|
with DI (Average) Splitting Feature |
---|
enabled |
---|
Check-In |
0. |
517s | 1.138s | 0. |
542s | 1.1s | 0.505s | 1. |
067s |
Check-Out |
0. |
796s | 1.552s | 0. |
841s |
1.6s | 0. |
804s | 1.48s |
DI Duration |
---|
without CI/ |
---|
CO Before Splitting Feature Deployed | DI Duration with CI/CO Before Splitting Feature Deployed | DI Duration without CI/ |
---|
CO Splitting Feature |
---|
disabled | DI Duration with CI/CO Splitting Feature disabled | DI Duration without CI/COÂ | DI Duration with CI/ |
---|
COÂ |
---|
Tenant _1 |
14 min (18 min for run 2) | 20 min | 27min 47sec | 31min 30sec |
16min 18sec | 16 min 53 sec | |
Tenant _2 |
16 min (18 min for run 2) | 19 min | 23min 16sec | 26min 22sec |
20min 13sec | 20min 39 sec |
Tenant _3 |
16 min |
(15 min for run 2) | 16 min | 18min 40sec | 20min 44sec |
17min 42sec | 17min 54 sec |
 * - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k
Response time graph
With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled
ocp3-mod-data-import:12
Data Import Robustness Enhancement
...
...
25K records | Â RECORDS_PER_SPLIT_FILE | ||||||||||
Number of concurrent tenants | Job profile | 500 | Status | 1K | Status | 5K | Status | 10K | Status | Test with Split disabled | Status |
---|---|---|---|---|---|---|---|---|---|---|---|
1 Tenant test#1 | PTF - Create 2 | 12 minutes 55 seconds | Completed | 11 minutes 48 seconds | Completed | 09 minutes 21 seconds | Completed | 9 minutes 2 sec | Completed | 10 |
minutes 35 sec | Completed | ||||||||
1 Tenant test#2 | 10 minutes 31 seconds | Completed | 09 minutes 32 seconds | Completed | 9 minutes 6 sec | Completed | 9 minutes 14 sec | Completed | 11 |
minutes 27 sec | Completed | |||||||||
2 Tenants test#1 | PTF - Create 2 | 19 minutes 29 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 15 seconds | Completed | 16 minutes 3 seconds | Completed | 19 |
minutes 18 sec | Completed | ||||||||
2 Tenants test#2 | 18 minutes 19 seconds | Completed | 15 minutes 47 seconds | Completed | 16 minutes 11 sec | Completed | 16 min 41 sec | Completed | 20 |
minutes 33 sec | Completed | |||||||||
3 Tenants test#1 | PTF - Create 2 | 24 minutes 15 seconds | Completed | 25 minutes 47 seconds | Completed | 23 minutes | Completed | 23 minutes 27 seconds | Completed | 30 |
minutes 2 sec | Completed | |||||||
3 Tenants test#2 | 24 minutes 38 seconds | Completed | 23 minutes 28 seconds | Completed | 23 minutes 2 sec | Completed | 23 minutes 26 seconds | Completed |
29 minutes 54 sec | Completed * |
*Â Â T1 - "00:33:35.1"
...
Error T2 - "01:23:36.144"
...
 T3 - "01:16:26.391"
...
*  on  on the first tenantproccesing stoped wit error " LOGS in progress io.vertx.core.impl.NoStackTraceThrowable: Connection is not active now, current status: CLOSED "
it caused the spike of CPU utilization on Kafka (tenant cluster) up to 94%Â
Instance CPU UtilizationÂ
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. The maximal CPU Utilization value is 38%.Â
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. The maximal CPU Utilization value is 37%.Â
Memory Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
...
Memory utilization rich maximal value for mod-source-record-storage-b 88%Â and for mod-source-record-manager-b 85%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
Service CPU UtilizationÂ
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
Â
RDS CPU UtilizationÂ
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal  CPU Utilization = 95%
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal  CPU Utilization = 94%
RDS Database Connections
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test.
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal  CPU Utilization = 94%
Appendix
Infrastructure ocp3Â with the "Bugfest" Dataset
Records count :
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = Â 620042011
- tenant2_mod_source_record_storage.marc_indexers =Â 0
- tenant3_mod_source_record_storage.marc_indexers =Â 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32Â
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3Â
...
2 database instances, one reader, and one writer
...
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...
Before Splitting Feature released
...
Retesting DI file-splitting feature on Poppy release
Retest the DI feature to be sure that the new changes have not affected performance negatively. Retest the DI file-splitting feature for the following scenarios:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Brief comparison summary
The duration of the import date has increased, in particular(diff= Poppy time processing - Orchid time processing ):
- 250K MARC BIB Create PTF - Create 2 ---> 44 minutes
- 250K MARC BIB UpdatePTF - Updates Success - 1 -→ 45 minutes
- Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 2 -→1 hour 35 minutes
- Check-Out without DI ~ 200ms
- Check-In without DI ~ 65ms
- Check-Out with DI ~ 770ms
- Check-in with DI ~ 330ms
Resource utilization:
- Service CPU utilization on Poppy is about the same as on the Orchid;
- Memory utilization on Poppy is about the same as on the Orchid;
- RDS CPU Utilization during all tests and on both releases was about 96%;
- The number of connections to DB on both releases was about the same from 550(Test 1.1) to 1200(Test 1.4).
Test 1. Single tenant(primary fs09000000): create and update 250K fileÂ
Test # | Test parameters | Profile | Duration (Poppy) Splitting Feature Enabled | Status | Previous results (Orchid ) Duration | diff= Poppy time processing - Orchid time processing | Duration (Poppy) Splitting Feature Disabled |
---|---|---|---|---|---|---|---|
1.1 | 250K MARC BIB Create | PTF - Create 2 | 2 hours 16 min | Completed | 1 hour 32 min | 44 minutes | failed |
1.2 | 250K MARC BIB Update | PTF - Updates Success - 1 | 3 hours 1 min | Completed | 2 hours 16 min | 45 minutes | failed |
1.3 | Multitenant MARC Create (100k, 50k, and 1 record) | PTF - Create 2 | 4 hours 14min | Completed | 2 hours 40 min | 1 hour 35 minutes | failed |
On Poppy with the split feature disabled, large files stopped processing. Created ticket to this problem
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Test 1.4Â With CI/CO 20 users and DI 25k records on each of the 3 tenantsÂ
Splitting Feature enabled | Release: Orchid Response time without DI (Average)Â | Release: Orchid | Release: Poppy | Release: Poppy | diff= Poppy time processing - Orchid time processing without DI | diff= Poppy time processing - Orchid time processing with DI |
---|---|---|---|---|---|---|
Check-Out | 0.804s | 1.48s | 1.03s | 2.26s | Â 200ms | 770ms |
Check-In | 0.505s | 1.067s | 0.570s | 1.4s | 65ms | 330ms |
Release: Orchid DI Duration with CI/COÂ | Release: Poppy DI Duration with CI/COÂ | |
---|---|---|
Tenant _1 | 16 min 53 sec | 34 min 55 sec |
Tenant _2 | 20min 39 sec | 27 min 39 sec |
Tenant _3 | 17min 54 sec | 25 min 17 sec |
Resource utilization during testing
Test 1.1. Data-import of 250K records file with "PTF - Create 2" job profile
Service CPU UtilizationÂ
The sharp spike of CPU at the beginning of test 1, We see similar behavior in all of the DI tests. СPU consumption was uniform during the test.
Memory Utilization
The memory consumption was not affected, the mod-source-records-manager service increased the memory usage from 45% to 60% during the test, but after the test, the memory started to return to the pre-test value.
RDS CPU Utilization Â
Consumption of the database CPU was 97% throughout the test
RDS Database Connections
The average number of DB connections during the test was about 550.
Test 1.2. Data-import of 250K records file with "PTF - Update" job profile
Service CPU UtilizationÂ
СPU consumption was stable during the test, except mod-inventory service at the beginning of the test the CPU usage was about 140% at the end of the test CPU value was about 200%. Â
Memory Utilization
The memory was stable and without memory leaks.
RDS CPU UtilizationÂ
Consumption of the database CPU was 97% throughout the test
RDS Database Connections
The average number of DB connections during the test was about 550.
Test 1.3. Multitenant MARC Create (100k, 50k, and 1 record)
Service CPU UtilizationÂ
СPU consumption was stable during the test. However, in the last hour of the test, the services mod-inventory and mod-quick-mark increare the CPU utilization by 75%
Memory Utilization
The memory was stable and without memory leaks.
RDS CPU UtilizationÂ
Consumption of the database CPU was 96% throughout the test
RDS Database Connections
The average number of DB connections during the test was about 800.
Test 1.4. Data-import of 250K records file with "PTF - Update" job profile
Service CPU UtilizationÂ
Memory UtilizationÂ
The memory was stable and without memory leaks.
RDS CPU UtilizationÂ
Consumption of the database CPU was 96% throughout the test
RDS Database Connections
The average number of DB connections during the test changed from 400 to 1200.
CICO responce time graph
Retesting DI file-splitting feature on Poppy release with Refresh Token Rotation (RTR) and file-splitting feature
The goal of the tests was to investigate how the file-splitting feature caused Data-import on Poppy release and the impact of Refresh Token Rotation (RTR). The tests were performed on ocp3(Poppy), pcp1(Poppy) and ncp5(Orchid)Â environments.
Refresh Token Rotation (RTR) Jira Legacy server System JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-723
Brief comparison summary
- Refresh Token Rotation configuration does not affect the data import process in any way, whether creating or updating a profile.
- In the Poppy release 250,000 records of data import with PTF - Create-2 job profile failed, and 50,000 records of data import with PTF - Updates Success - 1 job profile also failed in all of the tests, except configuration when FSF=ture;
- Data import works slowly on Poppy compared to the Orchid
- As the number of records in the file for data import increases, the processing time also increases. Up to 25,000 records, the duration of the data import is approximately the same.
- In the Poppy release data-import with an enabled file-splitting feature works slower compared to data-import with a disabled file-splitting feature.
- Data import is performed approximately 5% faster when the file-splitting feature parameters are absent in the task definition configuration.
Test results
DI tests/ Configuration | ncp5 Orchid | ocp3 FSF true  without RTR token | *ocp3 FSF false without RTR token | ocp3 FSF deleted without token | ocp3 FSF false AT =RT= 300; | ocp3  FSF false AT =RT= 1000000000 | pcp1 FSF false AT =RT= 10000000 | pcp1 FSF false without token retest* |
---|---|---|---|---|---|---|---|---|
250k_bib_Create_1.mrc | not tested | not tested | failed | failed | failed | failed | failed | failed |
100k_bib_Create.mrc | 00:41:41 | 00:54:32 | 00:54:36 | 00:53:59 | 00:48:56 | 00:54:42.05 | 00:47:17 | "01:01:39" |
50k_bib_Create.mrc | 00:19:43 | 00:30:40 | 00:25:39 | 00:22:17 | 00:27:05 | 00:30:09 | 00:21:45 | 00:20:46 |
25k_bib_Create.mrc | 00:10:11 | 00:13:53 | 00:12:46 | 00:10:33 | 00:12:42 | 00:13:25 | 00:11:54 | 00:10:53 |
10k_bib_Create.mrc | 00:04:19 | 00:07:22 | 00:05:35 | 00:04:38 | not tested | 00:05:33. | 00:04:42 | 00:04:36 |
5k_bib_Create.mrc | 00:02:35 | 00:04:31 | 00:02:43 | 00:02:55 | not tested | 00:03:07 | 00:02:55 | 00:02:30 |
1k_bib_Create.mrc | not tested | not tested | not tested | not tested | not tested | not tested | 00:00:54 | not tested |
DI-25K-Update.mrc | not tested | not tested | finished successfully | failed | failed | finished successfully | failed | finished successfully |
Column with "pcp1 FSF false without token" has testing results on the configuration similar to "ocp3 FSF false without RTR token".
Resource utilization during testing
Service CPU utilization during the Data-import process
The next data import jobs were carried out
1) 5k_bib_Create 2) 10k_bib_Create 3) 25k_bib_Create 4) 50k_bib_Create 5) 50k_bib_Create 6) 100k_bib_Create 7) 50k_bib_Create 8) 25k_bib_Create 9) 25k_bib_Update 10) 50k_bib_Update(stopped)
CPU utilization was stable during all jobs, but some spikes of data-import jobs were at the beginning of all tests.
Â
Expand | ||
---|---|---|
| ||
Memory Utilization
Most of the modules were stable during the test, and no memory leak is suspected for DI modules, except mod-inventory-b which consumed about 92% of memory during all DI processes.Â
RDS CPU UtilizationÂ
Maximal  CPU Utilization = 95%
RDS Database Connections
The maximal number of DB connections during the tests was about 580.
Database load
Top SQL queries
Appendix
Infrastructure ocp3Â with the "Bugfest" Dataset
Records count :
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = Â 620042011
- tenant2_mod_source_record_storage.marc_indexers =Â 0
- tenant3_mod_source_record_storage.marc_indexers =Â 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32Â
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3Â
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSKÂ ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
Before Splitting Feature released
Module ocp3-pvt Mon Sep 11 09:33:28 UTC 2023 | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
---|---|---|---|---|---|---|---|---|---|---|
mod-remote-storage | 13 | mod-remote-storage:2.0.3 | 2 | 4920 | 4472 | 1024 | 3960 | 512 | 512 | false |
mod-agreements | 8 | mod-agreements:5.5.2 | 2 | 1592 | 1488 | 128 | 968 | 384 | 512 | false |
mod-data-import | 7 | mod-data-import:2.7.1 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | false |
mod-search | 30 | mod-search:2.0.1 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | false |
mod-authtoken | 7 | mod-authtoken:2.13.0 | 2 | 1440 | 1152 | 512 | 922 | 88 | 128 | false |
mod-configuration | 7 | mod-configuration:5.9.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-inventory-storage | 1 | mod-inventory-storage:26.1.0-SNAPSHOT.665 | 0 | 2208 | 1952 | 1024 | 1440 | 384 | 512 | false |
mod-circulation-storage | 15 | mod-circulation-storage:16.0.1 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-source-record-storage | 11 | mod-source-record-storage:5.6.7 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false |
mod-calendar | 7 | mod-calendar:2.4.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-inventory | 12 | mod-inventory:20.0.6 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | false |
mod-circulation | 9 | mod-circulation:23.5.6 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-di-converter-storage | 8 | mod-di-converter-storage:2.0.5 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-pubsub | 8 | mod-pubsub:2.9.1 | 2 | 1536 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-users | 8 | mod-users:19.1.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-patron-blocks | 8 | mod-patron-blocks:1.8.0 | 2 | 1024 | 896 | 1024 | 768 | 88 | 128 | false |
mod-source-record-manager | 9 | mod-source-record-manager:3.6.4 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false |
nginx-edge | 7 | nginx-edge:2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
mod-quick-marc | 7 | mod-quick-marc:3.0.0 | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | false |
nginx-okapi | 7 | nginx-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
okapi-b | 8 | okapi:5.0.1 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-feesfines | 7 | mod-feesfines:18.2.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-patron | 7 | mod-patron:5.5.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-notes | 7 | mod-notes:5.0.1 | 2 | 1024 | 896 | 128 | 952 | 384 | 512 | false |
pub-okapi | 7 | pub-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | false |
Service versions for Splitting Feature test
Module ocp3-pvt Mon Sep 25 12:43:06 UTC 2023 | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
---|---|---|---|---|---|---|---|---|---|---|
mod-data-import | 10 | mod-data-import:2.7.2-SNAPSHOT. |
137 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | false | ||
mod-search | 30 | mod-search:2.0.1 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | false |
mod- |
configuration |
8 | mod- |
configuration: |
5. |
9. |
1 | 2 |
1024 |
896 |
128 |
768 | 88 | 128 | false |
mod- |
bulk-operations | 7 | mod-bulk- |
operations: |
1. |
0. |
6 | 2 |
3072 |
2600 |
1024 |
1536 |
384 |
512 | false | |||||||||
mod-inventory-storage | 1 | mod-inventory-storage:26.1.0-SNAPSHOT.665 | 0 | 2208 | 1952 | 1024 | 1440 | 384 | 512 | false |
mod-circulation-storage | 15 | mod-circulation-storage:16.0.1 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-source-record-storage |
12 | mod-source-record-storage:5.6.7 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | false | |
mod-calendar | 7 | mod-calendar:2.4.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-inventory | 12 | mod-inventory:20.0.6 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | false |
mod-circulation | 9 | mod-circulation:23.5.6 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | false |
mod-di-converter-storage | 8 | mod-di-converter-storage:2.0.5 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-pubsub |
9 | mod-pubsub:2.9.1 | 2 | 1536 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-users |
9 | mod-users:19.1.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false |
mod-patron-blocks |
9 | mod-patron-blocks:1.8.0 | 2 | 1024 | 896 | 1024 | 768 | 88 | 128 | false |
mod-source-record-manager |
12 | mod-source-record-manager:3.6. |
5-SNAPSHOT.245 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 |
false | ||||||||||
mod-quick-marc | 7 | mod-quick-marc:3.0.0 | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | false |
nginx-okapi | 7 | nginx-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | false |
okapi-b | 8 | okapi:5.0.1 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | false |
mod-feesfines |
8 | mod-feesfines |
:18.2.1 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | false | ||
mod-notes | 7 | mod-notes:5.0.1 | 2 | 1024 | 896 | 128 | 952 | 384 | 512 | false |
pub-okapi | 7 | pub-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | false |
Service versions for retesting Splitting Feature test on Poppy release.Â
Module |
ocp3-pvt
Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled |
mod- |
circulation- |
storage |
16 | mod- |
circulation- |
storage: |
17. |
1.0 | 2 |
2880 |
2592 |
1536 |
1814 |
384 | 512 |
FALSE |
mod- |
source-record-storage | 13 | mod-source-record-storage:5.7.0 | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE |
mod-calendar | 8 | mod-calendar:2.5.0 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
mod-inventory | 13 | mod-inventory:20.1.0 | 2 | 2880 | 2592 | 1024 | 1814 | 384 | 512 | FALSE |
mod-circulation | 10 | mod-circulation:24.0.0 | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 |
FALSE |
mod- |
di- |
converter-storage |
9 | mod- |
di- |
converter-storage: |
2. |
1. |
0 | 2 |
1024 |
896 |
128 |
768 |
88 |
128 |
FALSE |
mod- |
pubsub |
10 | mod- |
pubsub:2. |
11. |
0 | 2 |
1536 |
1440 |
1024 |
922 |
384 |
512 |
FALSE |
mod- |
users |
10 | mod- |
users: |
19.2.0 |
2 |
1024 |
896 |
128 |
768 |
88 |
128 |
FALSE |
mod- |
patron-blocks |
10 | mod-patron- |
blocks: |
1. |
9. |
0 | 2 |
1024 |
896 |
1024 |
768 |
88 |
128 |
FALSE |
mod- |
source- |
record- |
manager |
15 | mod- |
source- |
record- |
manager: |
3.7.0 |
2 |
5600 |
5000 |
2048 |
3500 |
384 |
512 |
FALSE |
mod-quick- |
marc |
8 | mod- |
quick-marc: |
5. |
0.0 | 1 |
2288 |
2176 |
128 |
1664 | 384 | 512 |
FALSE |
nginx- |
okapi |
8 |
nginx- |
okapi: |
2023. |
06. |
14 | 2 | 1024 | 896 | 128 |
0 |
0 |
0 |
FALSE |
okapi- |
b | 9 |
okapi:5.1. |
1 |
3 |
1684 |
1440 | 1024 |
922 |
384 |
512 |
FALSE |
mod- |
feesfines | 9 | mod- |
feesfines:19.0.0 | 2 |
1024 |
896 |
128 |
768 |
88 |
128 |
FALSE |
mod- |
notes |
8 | mod- |
notes: |
5. |
1.0 |
2 |
1024 |
896 | 128 |
952 | 384 | 512 |
FALSE |
pub-okapi |
8 |
pub-okapi:2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | FALSE |
mod-data-import | 36 | 579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-data-import:3.0 |
.3 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | FALSE | ||
mod-search | 31 | 579891902283.dkr.ecr.us-east-1.amazonaws.com/folio/mod-search:3.0.0 | 2 | 2592 | 2480 | 2048 | 1440 | 512 | 1024 | FALSE |
mod-configuration | 9 | mod-configuration:5.9.2 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 |
FALSE |
mod-bulk- |
operations |
8 | mod- |
bulk-operations:1.1.0 |
2 |
3072 |
2600 |
1024 |
1536 | 384 | 512 |
FALSE |
edge- |
ncip |
8 |
edge- |
ncip: |
1. |
9. |
0 | 2 | 1024 |
FALSE | ||||||||||
mod-inventory-storage | 8 | mod-inventory-storage:27.0.0 | 2 | 8961 | FALSE |
Methodology/Approach
To set splitting feature: Detailed Release Notes for Data Import Splitting Feature
...
Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users, DI file size: 25k
Test 4. To define the optimal value for RECORDS_PER_SPLIT_FILE(500, 1K, 2K, 5K) data-import job with PTF-Create-2 profile were run for 25K for 1 tenant simultaneously, for 2 tenants and for 3 tenants.