MARC Authorities Update + Create [Orchid]

In the scope of PERF-386 - Getting issue details... STATUS it's needed to run tests to answer questions: 

  • Determine time it takes to complete import
  • Determine main modules that are involved in the process (if obvious or if known)
  • Test specific settings or items or scenarios: Check-in and Checkout (CICO) is in progress and there are 5 concurrent users.  Test concurrent DI jobs with multi-tenants in the same cluster.


  • Time to successfully complete 1k records data import available for tenant ptf-ncp5-00 and is approximately 15 sec, 5k records - 1 min, 10k records - 2 min, 22.7k records - 4 min 30 sec, and 50k records data import is approximately 9 min 37 sec.
  • Main modules that are involved in the process:
    1. mod-quick-marc
    2. mod-source-record-storage
    3. mod-inventory
    4. mod-source-record-manager
    5. mod-data-import
    6. mod-di-converter-storage
    7. mod-search
    8. nginx-okapi
    9. mod-inventory-storage
    10. okapi
    11. mod-entities-links
  • DI with CI/CO - no degradation for data import time but degradation for Check-in and Checkout time is up to 3 times during Data import. Multitenant testing of concurrent jobs from different tenants and consecutive jobs from ptf-ncp5-01, and ptf-ncp5-02 tenants both were completed with errors where all records were discarded, sometimes one and only run for each DI file could be successful. Jobs for 2 or 3 tenants simultaneously were tested but never finished due to an error MODSOURCE-581 - Getting issue details... STATUS * occurred. As jobs were stopped by the user due to an error (about 10-15% done for 5 hours), the results are irrelevant.
  • Memory utilization grows for 3 modules: mod-source-record-manager:3.6.2  from 83% to 105%, mod-source-record-storage:5.6.5  from 86% to 89%, mod-inventory-storage:26.0.0 from 42% to 54%. Jira ticket is opened PERF-541 - Getting issue details... STATUS . All other modules behave stable during Data Import.  
    17/05/2023 in accordance with description of PERF-541 the series of tests were performed. The growth of memory for mod-source-record-manager was not significant and stabilized after some time. The heap dump analysis was performed for all modules and it didn't reveal memory leaks. 
  • Most CPU-consuming modules: mod-quick-marc - 79%, mod-source-record-storage - 74%, mod-inventory - 69%, mod-source-record-manager - 67%, others - usage less than 30%.

* MODSOURCE-581 - SPIKE: Multiple tenant DI testing - import jobs are hanging CLOSED is reproducible for Orchid release with modules configuration mod-source-record-storage: cpu:1024 memory:4096/3688, DB_MAXPOOLSIZE=30, DB_CONNECTION_TIMEOUT=40
mod-source-record-manager: cpu:1024 memory:4096/3688, DB_MAXPOOLSIZE=30. And planned to be retested with an increased size of the database PERF-544 - Getting issue details... STATUS , and with all needed Trigger functions too PERF-547 - Getting issue details... STATUS .

Recommendations & Jiras (Optional)


MODSOURMAN-982 - Getting issue details... STATUS Do not process chunks when the DI job is completed

PERF-541 - Getting issue details... STATUS Investigate potential memory leak for DI modules

MODDATAIMP-809 - Getting issue details... STATUS Investigate why records are discarded for jobs completed with errors.

Test Runs & Results

Job Profile "KG Create authority" -  https://bugfest-nolana.int.aws.folio.org/settings/data-import/job-profiles/view/d3271c74-97ec-4dd9-9470-97b2154d63fd?query=KG&sort=name

Baseline test

 Test with CICO 5 concurrent users

Test #

# of records 

Time it takes to complete importCI time AvgBaseline CI Avg deltaCI time 95th pctBaseline CI deltaCO time Avg

Baseline CO Avg


CO time 95th pctBaseline CO delta
11,00014 sec0.585+21%0.778+37%1.012+34%1.426+62%
25,00056 sec0.914+90%1.467+157%1.305+73%2.403+173%
310,0001 min 54 sec0.907+89%1.759+209%1.408+86%2.721+209%
4227784 min 32 sec0.853+78%1.616+184%1.425+89%2.497+183%
550,0009 min 37 sec0.862+80%1.471+158%1.510+100%2.403+173%
BaselineAvg95th pct

Multitenant testing

  • test 1-5: testing DI on each tenant consecutively (5 jobs from 3 tenants = 15 test runs)
  • test 6-8: testing DI jobs from two tenants simultaneously with 1 min ramp-up.
  • test 9: testing DI jobs from 3 tenants simultaneously with 1 min ramp-up.

Test #

# of records

Tenant ptf-ncp5-00 time


Tenant ptf-ncp5-01 time


Tenant ptf-ncp5-02 time



1,00015 secCOMMITTED56 sec / 17 sec


/ other ERROR

13 sec - 30 min


one of the jobs stuck for 30 min* 


5,0001 minCOMMITTED58 sec


/ other ERROR

47 sec - 55 min


/ other ERROR
one of the jobs stuck for 30 min

3.10,0002 min 02 secCOMMITTED1 min 36 sec1 time COMMITTED 19 min 22 sec


4227784 min 20 secCOMMITTED11 min 52 secERROR-
550,0009 min 53 secCOMMITTED3 min 56 secERROR-
6Tenant-00 + Tenant-01 50000 recordsgStopped by user

MODSOURCE-581 - Getting issue details... STATUS

7Tenant-01 + Tenant-02 50000 recordsStopped by user

MODSOURCE-581 - Getting issue details... STATUS

8Tenant-00 + Tenant-02 50000 recordsStopped by user

MODSOURCE-581 - Getting issue details... STATUS

9Tenant-00 +Tenant-01 + Tenant-02 50000 recordsStopped by user

MODSOURCE-581 - Getting issue details... STATUS

Jobs were always successful for tenant ptf-ncp5-00. For another 2 tenants jobs were Completed with errors where all records were discarded, sometimes one and only run for each DI file could be successful.
Jobs for 2 or 3 tenants simultaneously were tested but never finished due to an error MODSOURCE-581 - Getting issue details... STATUS occurred. As jobs were stopped by the user due to an error (about 10-15% done for 5 hours), the results are irrelevant.

Multitenant testing errors and warnings:


11:16:50 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 2 subscriptionPattern: SubscriptionDefinition(eventType=DI_PARSED_RECORDS_CHUNK_SAVED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_PARSED_RECORDS_CHUNK_SAVED) offset: 1947

io.vertx.core.impl.NoStackTraceThrowable: Timeout

11:16:50 [] [] [] [] WARN  taImportKafkaHandler handle:: Error with database during collecting of deduplication info for handlerId: 6713adda-72ce-11ec-90d6-0242ac120003 , eventId: e4a75577-b3b0-4404-bd2f-f9586fd412c3. 

io.vertx.core.impl.NoStackTraceThrowable: Timeout

11:16:50 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 3 subscriptionPattern: SubscriptionDefinition(eventType=DI_COMPLETED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_COMPLETED) offset: 68691 

io.vertx.core.impl.NoStackTraceThrowable: Timeout

11:16:50 [] [] [] [] WARN  tHandlingServiceImpl handle:: Failed to handle DI_COMPLETED event 

io.vertx.core.impl.NoStackTraceThrowable: Timeout

11:16:50 [] [] [] [] WARN  rdChunksKafkaHandler handle:: RecordsBatchResponse processing has failed with errors chunkId: f5a92a02-86ce-4afa-aeab-7931f1fd13c6 chunkNumber: 742 jobExecutionId: ff56fb28-5c8a-4109-95eb-33bb0dbed57c 

io.vertx.core.impl.NoStackTraceThrowable: Timeout


12:07:05 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 13 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_AUTHORITY_RECORD_CREATED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_SRS_MARC_AUTHORITY_RECORD_CREATED) offset: 9181 

io.vertx.core.impl.NoStackTraceThrowable: handle:: Failed to process data import event payload from topic 'ncp5.Default.fs07000002.DI_SRS_MARC_AUTHORITY_RECORD_CREATED' by jobExecutionId: '719bcf8f-0017-4b92-93b8-b85e46566634' with recordId: 'f3360e49-908e-4bbf-9c0e-45d811b9863a' and chunkId: '1a556829-8290-4e8c-ab8e-eb15d19af624' 

12:07:05 [] [] [] [] WARN  KafkaConsumerWrapper businessHandlerCompletionHandler:: Error handler has not been implemented for subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_AUTHORITY_RECORD_CREATED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_SRS_MARC_AUTHORITY_RECORD_CREATED) failures

12:07:05 [] [] [] [] ERROR KafkaConsumerWrapper businessHandlerCompletionHandler:: Error while processing a record - id: 13 subscriptionPattern: SubscriptionDefinition(eventType=DI_SRS_MARC_AUTHORITY_RECORD_CREATED, subscriptionPattern=ncp5\.Default\.\w{1,}\.DI_SRS_MARC_AUTHORITY_RECORD_CREATED) offset: 9181 

io.vertx.core.impl.NoStackTraceThrowable: handle:: Failed to process data import event payload from topic 'ncp5.Default.fs07000002.DI_SRS_MARC_AUTHORITY_RECORD_CREATED' by jobExecutionId: '719bcf8f-0017-4b92-93b8-b85e46566634' with recordId: 'f3360e49-908e-4bbf-9c0e-45d811b9863a' and chunkId: '1a556829-8290-4e8c-ab8e-eb15d19af624' 

12:07:05 [] [] [] [] WARN  AbstractConfig       These configurations '[ssl.protocol, ssl.keystore.location, ssl.truststore.type, ssl.keystore.type, ssl.truststore.location, ssl.keystore.password, ssl.key.password, ssl.truststore.password, ssl.endpoint.identification.algorithm]' were supplied but are not used yet. 


12:05:46 [636406/data-import-profiles] [fs07000002] [90aad488-be59-4879-b63b-2f8f13b08e85] [mod_di_converter_storage] WARN  CQL2PgJSON           Doing LIKE search without index for job_profiles.jsonb->>'hidden', CQL >>> SQL: hidden == false >>> lower(f_unaccent(job_profiles.jsonb->>'hidden')) LIKE lower(f_unaccent('false')) 

Memory Utilization

Memory utilization grows for 3 modules:

  • mod-source-record-manager:3.6.2  from 83% to 105%.
  • mod-source-record-storage:5.6.5  from 86% to 89%.
  • mod-inventory-storage:26.0.0 from 42% to 54%.

Jira ticket is opened PERF-541 - Getting issue details... STATUS

All other modules behave stable during Data Import.

*This test was performed after a run of 2 sets of the same jobs (1k, 5k, 10k, 22.7k, 50k records twice)

Service CPU Utilization 

*On chart below - each little spike corresponds to each DI job performed. 

**Some of spikes is shorter than the others - because of differences in number of records imported.

**Test #1 has higher CPU usage because it has background activities (CICO 5 users + DI )

Most CPU-consuming modules: 

  • mod-quick-marc - 79%
  • mod-source-record-storage - 74%
  • mod-inventory - 69%
  • mod-source-record-manager - 67%
  • others - usage less than 30%

Instance CPU Utilization

RDS CPU Utilization 

Predictable that each DI job is consuming a lot of DB CPU (each spike here corresponds to each DI job).

Approximately DB CPU usage is ± 96%



PTF -environment ncp3 

  • m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 instances of db.r6.xlarge database instances, one reader, and one writer
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: 

Modules memory and CPU parameters



Task Definition

Running Tasks 








To test Baseline DI and DI with CICO 5 concurrent users the JMeter scripts were used.

Multitenant testing

  • test 1-5: testing DI on each tenant consecutively (5 jobs from 3 tenants = 15 test runs)
  • test 6-8: testing DI jobs from two tenants simultaneously with 1 min ramp-up.
  • test 9: testing DI jobs from 3 tenants simultaneously with 1 min ramp-up.

