MARC Authorities Update + Create [Orchid]
Overview
In the scope of - PERF-386Getting issue details... STATUS it's needed to run tests to answer questions:
- Determine time it takes to complete import
- Determine main modules that are involved in the process (if obvious or if known)
- Test specific settings or items or scenarios: Check-in and Checkout (CICO) is in progress and there are 5 concurrent users. Test concurrent DI jobs with multi-tenants in the same cluster.
Summary
- Time to successfully complete 1k records data import available for tenant ptf-ncp5-00 and is approximately 15 sec, 5k records - 1 min, 10k records - 2 min, 22.7k records - 4 min 30 sec, and 50k records data import is approximately 9 min 37 sec.
- Main modules that are involved in the process:
- mod-quick-marc
- mod-source-record-storage
- mod-inventory
- mod-source-record-manager
- mod-data-import
- mod-di-converter-storage
- mod-search
- nginx-okapi
- mod-inventory-storage
- okapi
- mod-entities-links
- DI with CI/CO - no degradation for data import time but degradation for Check-in and Checkout time is up to 3 times during Data import. Multitenant testing of concurrent jobs from different tenants and consecutive jobs from ptf-ncp5-01, and ptf-ncp5-02 tenants both were completed with errors where all records were discarded, sometimes one and only run for each DI file could be successful. Jobs for 2 or 3 tenants simultaneously were tested but never finished due to an error - MODSOURCE-581Getting issue details... STATUS * occurred. As jobs were stopped by the user due to an error (about 10-15% done for 5 hours), the results are irrelevant.
- Memory utilization grows for 3 modules: mod-source-record-manager:3.6.2 from 83% to 105%, mod-source-record-storage:5.6.5 from 86% to 89%, mod-inventory-storage:26.0.0 from 42% to 54%. Jira ticket is opened
-
PERF-541Getting issue details...
STATUS
. All other modules behave stable during Data Import.
17/05/2023 in accordance with description of PERF-541 the series of tests were performed. The growth of memory for mod-source-record-manager was not significant and stabilized after some time. The heap dump analysis was performed for all modules and it didn't reveal memory leaks. - Most CPU-consuming modules: mod-quick-marc - 79%, mod-source-record-storage - 74%, mod-inventory - 69%, mod-source-record-manager - 67%, others - usage less than 30%.
* MODSOURCE-581 - SPIKE: Multiple tenant DI testing - import jobs are hanging CLOSED is reproducible for Orchid release with modules configuration mod-source-record-storage: cpu:1024 memory:4096/3688, DB_MAXPOOLSIZE=30, DB_CONNECTION_TIMEOUT=40
mod-source-record-manager: cpu:1024 memory:4096/3688, DB_MAXPOOLSIZE=30. And planned to be retested with an increased size of the database
-
PERF-544Getting issue details...
STATUS
, and with all needed Trigger functions too
-
PERF-547Getting issue details...
STATUS
.
Recommendations & Jiras (Optional)
Jiras
- MODSOURMAN-982Getting issue details... STATUS Do not process chunks when the DI job is completed
- PERF-541Getting issue details... STATUS Investigate potential memory leak for DI modules
- MODDATAIMP-809Getting issue details... STATUS Investigate why records are discarded for jobs completed with errors.
Test Runs & Results
Job Profile "KG Create authority" - https://bugfest-nolana.int.aws.folio.org/settings/data-import/job-profiles/view/d3271c74-97ec-4dd9-9470-97b2154d63fd?query=KG&sort=name
Baseline test
Test # | # of records | % with updates | % creates | File | Time it takes to complete import |
---|---|---|---|---|---|
1 | 1,000 | 0 | 100 | https://folio-org.atlassian.net/wiki/download/attachments/1385982/1k_marc_authority.mrc?api=v2 | 14 sec |
2 | 5,000 | 0 | 100 | https://folio-org.atlassian.net/wiki/download/attachments/1385982/LC_SUBJ_msplit00000000.mrc?api=v2 | 55 sec |
3 | 10,000 | 0 | 100 | https://folio-org.atlassian.net/wiki/download/attachments/1385982/msplit00000000.mrc?api=v2 | 1 min 59 sec |
4 | 22778 | 0 | 100 | https://folio-org.atlassian.net/wiki/download/attachments/1385982/msplit00000013.mrc?api=v2 | 4 min 31 sec |
5 | 50,000 | 0 | 100 | https://folio-org.atlassian.net/wiki/download/attachments/1385982/50000_authorityrecords.mrc?api=v2 | 9 min 48 sec |
Test with CICO 5 concurrent users
Test # | # of records | Time it takes to complete import | CI time Avg | Baseline CI Avg delta | CI time 95th pct | Baseline CI delta | CO time Avg | Baseline CO Avg Delta | CO time 95th pct | Baseline CO delta |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1,000 | 14 sec | 0.585 | +21% | 0.778 | +37% | 1.012 | +34% | 1.426 | +62% |
2 | 5,000 | 56 sec | 0.914 | +90% | 1.467 | +157% | 1.305 | +73% | 2.403 | +173% |
3 | 10,000 | 1 min 54 sec | 0.907 | +89% | 1.759 | +209% | 1.408 | +86% | 2.721 | +209% |
4 | 22778 | 4 min 32 sec | 0.853 | +78% | 1.616 | +184% | 1.425 | +89% | 2.497 | +183% |
5 | 50,000 | 9 min 37 sec | 0.862 | +80% | 1.471 | +158% | 1.510 | +100% | 2.403 | +173% |
Baseline | Avg | 95th pct |
---|---|---|
CI | 0.480 | 0.569 |
CO | 0.755 | 0.881 |
Multitenant testing
- test 1-5: testing DI on each tenant consecutively (5 jobs from 3 tenants = 15 test runs)
- test 6-8: testing DI jobs from two tenants simultaneously with 1 min ramp-up.
- test 9: testing DI jobs from 3 tenants simultaneously with 1 min ramp-up.
Test # | # of records | Tenant ptf-ncp5-00 time | Comment | Tenant ptf-ncp5-01 time | Comment | Tenant ptf-ncp5-02 time | Comment |
1. | 1,000 | 15 sec | COMMITTED | 56 sec / 17 sec | 1 time COMMITTED / other ERROR | 13 sec - 30 min | ERROR one of the jobs stuck for 30 min* |
2. | 5,000 | 1 min | COMMITTED | 58 sec | 1 time COMMITTED / other ERROR | 47 sec - 55 min | 1 time COMMITTED / other ERROR |
3. | 10,000 | 2 min 02 sec | COMMITTED | 1 min 36 sec | 1 time COMMITTED | 19 min 22 sec | ERROR |
4 | 22778 | 4 min 20 sec | COMMITTED | 11 min 52 sec | ERROR | - | |
5 | 50,000 | 9 min 53 sec | COMMITTED | 3 min 56 sec | ERROR | - | |
6 | Tenant-00 + Tenant-01 50000 recordsg | Stopped by user | |||||
7 | Tenant-01 + Tenant-02 50000 records | Stopped by user | |||||
8 | Tenant-00 + Tenant-02 50000 records | Stopped by user | |||||
9 | Tenant-00 +Tenant-01 + Tenant-02 50000 records | Stopped by user |
Jobs were always successful for tenant ptf-ncp5-00. For another 2 tenants jobs were Completed with errors where all records were discarded, sometimes one and only run for each DI file could be successful.
Jobs for 2 or 3 tenants simultaneously were tested but never finished due to an error
-
MODSOURCE-581Getting issue details...
STATUS
occurred. As jobs were stopped by the user due to an error (about 10-15% done for 5 hours), the results are irrelevant.
Multitenant testing errors and warnings:
11:16:50 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 2 subscriptionPattern: SubscriptionDefinition(eventType=DI_PARSED_RECORDS_CHUNK_SAVED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_PARSED_RECORDS_CHUNK_SAVED) offset: 1947 |
io.vertx.core.impl.NoStackTraceThrowable: Timeout |
11:16:50 [] [] [] [] WARN taImportKafkaHandler handle:: Error with database during collecting of deduplication info for handlerId: 6713adda-72ce-11ec-90d6-0242ac120003 , eventId: e4a75577-b3b0-4404-bd2f-f9586fd412c3. |
io.vertx.core.impl.NoStackTraceThrowable: Timeout |
11:16:50 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 3 subscriptionPattern: SubscriptionDefinition(eventType=DI_COMPLETED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_COMPLETED) offset: 68691 |
io.vertx.core.impl.NoStackTraceThrowable: Timeout |
11:16:50 [] [] [] [] WARN tHandlingServiceImpl handle:: Failed to handle DI_COMPLETED event |
io.vertx.core.impl.NoStackTraceThrowable: Timeout |
11:16:50 [] [] [] [] WARN rdChunksKafkaHandler handle:: RecordsBatchResponse processing has failed with errors chunkId: f5a92a02-86ce-4afa-aeab-7931f1fd13c6 chunkNumber: 742 jobExecutionId: ff56fb28-5c8a-4109-95eb-33bb0dbed57c |
io.vertx.core.impl.NoStackTraceThrowable: Timeout |
12:07:05 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 13 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_AUTHORITY_RECORD_CREATED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_SRS_MARC_AUTHORITY_RECORD_CREATED) offset: 9181 |
io.vertx.core.impl.NoStackTraceThrowable: handle:: Failed to process data import event payload from topic 'ncp5.Default.fs07000002.DI_SRS_MARC_AUTHORITY_RECORD_CREATED' by jobExecutionId: '719bcf8f-0017-4b92-93b8-b85e46566634' with recordId: 'f3360e49-908e-4bbf-9c0e-45d811b9863a' and chunkId: '1a556829-8290-4e8c-ab8e-eb15d19af624' |