Data Import Create MARC holdings records [Nolana]

It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.

Overview 

This document contains the results of testing Data Import Create MARC holdings records in pre-Nolana to detect performance trends.

PERF-343 - Getting issue details... STATUS

Software versions

  • mod-data-import:2.6.1
  • mod-data-import-converter-storage:1.15.1
  • mod-source-record-storage:5.5.2
  • mod-source-record-manager:3.5.4
  • mod-inventory:19.0.1
  • mod-inventory-storage:25.0.1
  • mod-search:1.8.0
  • mod-quick-marc:2.5.0

Infrastructure

  • 10 m6i.2xlarge EC2 instances 
  • 2 instances of 'db.r6g.xlarge' database instances: one reader, and one writer
  • MSK
    • Broker type: kafka.m5.2xlarge
    • Total number of brokers: 4
    • Number of zones: 2
    • Brokers per zone: 2 
    • auto.create-topics.enable = true
    • log.retention.minutes=480
    • default.replication.factor=3
    • 2 partitions per DI topics
  • mod-data-import:2.6.1
    • 256 CPU Units
    • 2048/1844 Hard/Soft memory limits (MiB)
  • mod-data-import-converter-storage:1.15.1
    • 128 CPU Units
    • 1024/896 Hard/Soft memory limits (MiB)
  • mod-source-record-storage:5.5.2
    • 1024 CPU Units
    • 1536/1440 Hard/Soft memory limits (MiB)
  • mod-source-record-manager:3.5.4
    • 1024 CPU Units
    • 4096/3688 Hard/Soft memory limits (MiB)
  • mod-inventory:19.0.1
    • mod-inventory-b
      • 1024 CPU Units
      • 2880/2592 Hard/Soft memory limits (MiB)
      • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
      • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
      • kafka.consumer.max.poll.records=10
    • mod-inventory-x
      • 1024 CPU Units
      • 2880/2592 Hard/Soft memory limits (MiB)
      • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
      • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
      • kafka.consumer.max.poll.records=10
  • mod-inventory-storage:25.0.1
  • mod-search:1.8.0
    • 400 CPU Units
    • 2592/2480 Hard/Soft memory limits (MiB)
  • mod-quick-marc:2.5.0
    • 128 CPU Units
    • 2288/2176 Hard/Soft memory limits (MiB)

Summary

  • The duration of bigger files (80k) 'MARC Holdings' data import operations for the Nolana release looks faster (- 7 m 43 s) than the Morning Glory release results.
  • However, the Nolana release peak average CPU (mod-inventory - 73.9%) looks higher than Morning Glory (is not higher than 60% for all related modules).

Results

More detailed information about the duration time of the data import operation is in the following table:

Test

File

Duration: Morning_Glory

Duration: NolanaDiff_absoluteDiff_percentage

1

1k28s32.7 s+5 s+ 18%
25k1 m 48s4 m 20.8 s+2 m 33 s+ 142%
310k4 m 4s3 m 24.9 s

- 39 s

- 16%
480k29 m 6 s21 m 22.6 s- 7 m 43 s- 27%


Resources usage

Comparing Nolana's numbers against Morning Glory's.


Morning GloryNolana
CPU

Here CPU usage is not higher than 60% for all related modules.

The highest average CPU (mod-inventory) value is 73.9% at peak.

Memory

Concerning behaviour on :

  • mod-source-record-manager growing memory from 55% up to 93%
  • mod-source-record-storage growing memory from 41% up to 64%

However the last test for 80K didn't show any memory growth for any of the modules, so maybe the growth of mem usage can be explained by the working conditions of these modules. 


Increased memory usage was detected for:

  • mod-source-record-storage-b: from 40.2% up to 48.5% 
RDS CPU

RDS CPU usage reached 80% maximum during the test.  

ncp3-db-3fv7zu8sfdn5-auroracluster-nysenxyhpwrd