Data Import Create MARC holdings records [Morning Glory]

It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.

Overview 

This document contains the results of testing Data Import Create MARC holdings records in pre-lotus to detect performance trends.

Software versions

  • mod-data-import:2.5.0
  • mod-data-import-converter-storage:1.14.1
  • mod-source-record-storage:5.4.0
  • mod-source-record-manager:3.4.1
  • mod-inventory:18.2.2
  • mod-inventory-storage:24.1.0
  • mod-search:1.7.4
  • mod-quick-marc:2.4.1

Infrastructure

  • 10 m6i.2xlarge EC2 instances  (changed. In Lotus it was m5.xlarge)
  • 2 instances of db.r6.xlarge database instances, one reader and one writer
  • MSK
    • 4 m5.2xlarge brokers in 2 zones
    • auto.create-topics.enable = true
    • log.retention.minutes=120
    • 2 partitions per DI topics
  • mod-inventory memory
    • 1024 CPU units, 2592MB mem
    • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
    • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
    • kafka.consumer.max.poll.records=10
  • mod-inventory-storage
    • 1024 CPU units, 1684MB mem
  • mod-source-record-storage
    • 1024 CPU units, 1296MB mem
  • mod-source-record-manager
    • 1024 CPU units, 1844MB mem
  • mod-data-import
    • 256 CPU units, 1844MB mem
  • mod-data-import-cs 
    • 128 CPU units, 896MB mem


Results


test

file

duration

1

1k28s
25k1 m 48s
310k4 m 4s
480k29 m 6 s


Resources usage

Here CPU usage is not higher than 60% for all related modules.

Concerning behavior on :

  • mod-source-record-manager growing memory from 55% up to 93%
  • mod-source-record-storage growing memory from 41% up to 64%

However last test for 80K didn't showed any memory growing for any of module, so maybe growing of mem usage can be explained as working condition of this modules. 



RDS CPU usage reached 80% maximum during test.