Data Import MARC Authorities (Nolana)

It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.

Overview

This document contains the results of testing Data Import MARC Authorities in Nolana release to detect performance trends. PERF-344

Infrastructure

  • 10 m6i.2xlarge EC2 instances  
  • 2 instances of db.r6.xlarge database instances, one reader and one writer
  • MSK
    • 4 m5.2xlarge brokers in 2 zones 
    • auto.create-topics.enable = true
    • log.retention.minutes=480
    • 2 partitions per DI topics
    • default.replication.factor=3
  • mod-inventory memory
    • 1024 CPU units, 2592MB mem
    • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
    • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
    • kafka.consumer.max.poll.records=10
  • mod-inventory-storage
    • 1024 CPU units, 1962MB mem
  • mod-source-record-storage
    • 1024 CPU units, 1440MB mem
  • mod-source-record-manager
    • 1024 CPU units, 3688MB mem
  • mod-data-import
    • 256 CPU units, 1844MB mem
  • mod-data-import-cs 
    • 128 CPU units, 896MB mem

Software versions

  • mod-data-import v2.6.1
  • mod-data-import-converter-storage v1.15.1
  • mod-source-record-manager v3.5.4
  • mod-source-record-storage v5.5.2
  • mod-inventory v19.0.1
  • mod-inventory-storage v25.0.1


Results

Summary

MARC Authorities import test set was done on warmed up modules (before test set - several MARC BIB's imports were performed).

All of a tests was done successfully without errors and issues. For all of tests duration of import is smaller that it was for Morning Glory release

No memory leaks found.

  • R/W split enabled:

MARC Authority durations is ± the same for R/W split enabled and disabled.

RDS CPU usage on reader node was only 7% (this can explain almost same durations for imports)



Profile

Duration

Nolana

Duration

Morning Glory

1K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority27 s24 s
5K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority1 min 15 s1 min 21 s
10K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority2 min 31 s2 min 32 s
25K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority7 min 7 s11 min 14 s
50K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority11 min 24 s22 min



Resource Usages


Service CPU sage


Service Memory usage



DB CPU usage


Note: Each spike on this chart corresponding to each DI MARC Authorities test performed.



Instance level CPU usage


Read Write Split enabled 



Profile

Duration

Nolana R/W split enabled

Duration

Nolana

1K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority25 s27 s
5K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority1 min 20 s1 min 15 s
10K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority2 min 38 s2 min 31 s
25K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority6 min 3 s7 min 7 s
50K Default - Create SRS MARC AuthorityDefault - Create SRS MARC Authority12 min 36 s11 min 24 s


Resource Usages

Note: resource usage with R/W split enabled and without it is more or less the same (it's about 30-40% for most used modules). RDS CPU usage with R/W split is ±7%, while without R/W split it was 3-4%.





Note: on reader node load only 7%. This is explaining why results (duration ) is almost the same as they was without R/W split.

Note: Each spike on this chart corresponding to each DI MARC Authorities test performed.