In Progress(in review) + retesting results will be add to these report in scope of the Jira Legacyserver System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-681
server | System Jira |
---|---|
serverId | 01505d01-b853-3c2e-90f1-ee9b165564fc |
key | PERF-681 |
...
- One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException.
Reproduces in both cases with and without splitting feature enabled in at least 30% of test runs with 500k record files and multitenant testing.Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-748 - During the new Data Import splitting feature testing, items for update were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930 - UI issue, when canceled or completed with error Job progress bar cannot be deleted from the screen.
Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-929 - Usage:
- Should not use less than 1000 for RECORDS_PER_SPLIT_FILE. The system is stable enough to ingest 1000 records consistently and smaller amounts will incur more overheads, resulting in longer jobs' durations. CPU utilization for mod-di-converter-storage for 500 RECORDS_PER_SPLIT_FILE(RPSF) = 160%, for 1000RPSF =180%, for 5K RPSF =380% and for 10K RPSF =433%, so in the case of selecting configurations 5K or 10K we recommend to add more CPU to mod-di-converter-storage service.
- When toggling the file-splitting feature, mod-source-record-storage, mod-source-record-manager's tasks need to be restarted.
- Keep in mind about the Kafka broker's disk size (as bigger jobs - up to 500K - can be run now), consecutive jobs may use up the disk quickly because the messages' retention time currently is set at 8 hours. For example with 300GB disk size, consecutive jobs of 250K, 500K, 500K sizes will exhaust the disk.
- More CPU could be allocated to mod-inventory and mod-di-converter-storage
...
** - up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get. Less than 1% of records could be discarded due to missing permission for 'data-import-system-user'. Permission was not added automatically during the service deployment. I added permission manually to the database and the error does not occur anymore. Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-930
...
With CI/CO 20 users and DI 25k records on each of the 3 tenants Splitting Feature Disabled
ocp3-mod-data-import:12
Data Import Robustness Enhancement
...
Memory utilization rich maximal value for mod-source-record-storage-b 88% and for mod-source-record-manager-b 85%.
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test.
RDS CPU Utilization
Test 1. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 500, 2 runs for each test. Maximal CPU Utilization = 95%
...
Test 2. Test with 1, 2, and 3 tenants' concurrent jobs with configuration RECORDS_PER_SPLIT_FILE = 10K, 2 runs for each test. Maximal CPU Utilization = 94%
Retesting
...
DI file-splitting feature on Poppy release
Retest the DI feature to be sure that the new changes have not affected performance negatively. Retest the DI file-splitting feature for the following scenarios:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Test 1. Single tenant: create and update 250K file
...
Previous results (Orchid )
Duration
...
Brief comparison summary
The duration of the import date has increased, in particular(Poppy time processing - Orchid time processing ):
- 250K MARC BIB Create PTF - Create 2 ---> 44 minutes
- 250K MARC BIB UpdatePTF - Updates Success - 1 -→ 45 minutes
- Multitenant MARC Create (100k, 50k, and 1 record)
...
- PTF - Create 2
...
Test 1.4 With CI/CO 20 users and DI 25k records on each of the 3 tenants
...
Release: Orchid
Response time without DI (Average)
Release: Orchid
Response time with DI
(Average)
...
Release: Poppy
Response time without DI (Average)
...
Release: Poppy
Response time with DI (Average)
...
Release: Orchid
DI Duration with CI/CO
...
Release: Poppy
DI Duration with CI/CO
...
Resource utilization during testing
...
- -→1 hour 35 minutes
- Check-Out without DI ~ 200ms
- Check-In without DI ~ 650ms
- Check-Out with DI ~ 770ms
- Check-in with DI ~ 330ms
Resource utilization:
- Service CPU utilization on Poppy is about the same as on the Orchid;
- Memory utilization on Poppy is about the same as on the Orchid;
- RDS CPU Utilization during all tests and on both releases was about 96%;
- The number of connections to DB on both releases were about the same from 550(Test 1.1) to 1200(Test 1.4).
Test 1. Single tenant: create and update 250K file
Test # | Test parameters | Profile | Duration (Poppy) | Status | Previous results (Orchid ) Duration |
---|---|---|---|---|---|
1.1 | 250K MARC BIB Create | PTF - Create 2 | 2 hours 16 min | Completed | 1 hour 32 min |
1.2 | 250K MARC BIB Update | PTF - Updates Success - 1 | 3 hours 1 min | Completed | 2 hours 16 min |
1.3 | Multitenant MARC Create (100k, 50k, and 1 record) | PTF - Create 2 | 4 hours 14min | Completed | 2 hours 40 min |
Test 1.4 With CI/CO 20 users and DI 25k records on each of the 3 tenants
Splitting Feature enabled | Release: Orchid Response time without DI (Average) | Release: Orchid | Release: Poppy | Release: Poppy |
---|---|---|---|---|
Check-Out | 0.804s | 1.48s | 1.03s | 2.26s |
Check-In | 0.505s | 1.067s | 0.570s | 1.4s |
Release: Orchid DI Duration with CI/CO | Release: Poppy DI Duration with CI/CO | |
---|---|---|
Tenant _1 | 16 min 53 sec | 34 min 55 sec |
Tenant _2 | 20min 39 sec | 27 min 39 sec |
Tenant _3 | 17min 54 sec | 25 min 17 sec |
Resource utilization during testing
Test 1.1. Data-import of 250K records file with "PTF - Create 2" job profile
Service CPU Utilization
The shark spike of CPU at the beginning of test 1, We see similar behavior in all of the DI tests. СPU consumption was uniform during the test.
Memory Utilization
The memory consumption was not affected, the mod-source-records-manager service increased the memory usage from 45% to 60% during the test, but after the test, the memory started to return to the pre-test value.
RDS CPU Utilization
Consumption of the database CPU was 97% throughout the test
RDS Database Connections
The average number of DB connections during the test was about 550.
Test 1.2. Data-import of 250K records file with "PTF -
...
Update" job profile
Service CPU Utilization
Memory Utilization
RDS CPU Utilization
RDS Database Connections
...
Test 1.2. Data-import of 250K records file with "PTF - Update" job profile
...
СPU consumption was stable during the test, except mod-inventory service at the beginning of the test the CPU usage was about 140% at the end of the test CPU value was about 200%.
Memory Utilization
The memory was stable and without memory leaks.
RDS CPU Utilization
Consumption of the database CPU was 97% throughout the test
RDS Database Connections
The average number of DB connections during the test was about 550.
Test 1.3. Multitenant MARC Create (100k, 50k, and 1 record)
Service CPU Utilization
СPU consumption was stable during the test. However, in the last hour of the test, the services mod-inventory and mod-quick-mark increare the CPU utilization by 75%
Memory Utilization
The memory was stable and without memory leaks.
RDS CPU Utilization CPU Utilization
Consumption of the database CPU was 96% throughout the test
RDS Database Connections
The average number of DB connections during the test was about 800.
Test 1.4. Data-import of 250K records file with "PTF - Update" job profile
Service CPU Utilization
Memory Utilization
The memory was stable and without memory leaks.
RDS CPU Utilization
Consumption of the database CPU was 96% throughout the test
RDS Database Connections
The average number of DB connections during the test changed from 400 to 1200.
CICO responce time graph
Appendix
...
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = 620042011
- tenant2_mod_source_record_storage.marc_indexers = 0
- tenant3_mod_source_record_storage.marc_indexers = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
...
Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users, DI file size: 25k
Test 4. To define the optimal value for RECORDS_PER_SPLIT_FILE(500, 1K, 2K, 5K) data-import job with PTF-Create-2 profile were run for 25K for 1 tenant simultaneously, for 2 tenants and for 3 tenants.