Data Import test report (Nolana)

It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.



Overview

This document contains the results of testing Data Import for MARC Bibliographic records in the Nolana release to detect performance trends. PERF-341 - Getting issue details... STATUS

The figures achieved in PTF performance testing have not been achieved in Nolana Bugfest. Developers are reviewing the results to determinethe causes for the differences, in MODDATAIMP-752 - Getting issue details... STATUS

Infrastructure

  • 10 m6i.2xlarge EC2 instances  
  • 2 instances of db.r6.xlarge database instances, one reader and one writer
  • MSK
    • 4 m5.2xlarge brokers in 2 zones 
    • auto.create-topics.enable = true
    • log.retention.minutes=480
    • 2 partitions per DI topics
    • default.replication.factor=3
  • mod-inventory memory
    • 1024 CPU units, 2592MB mem
    • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
    • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
    • kafka.consumer.max.poll.records=10
  • mod-inventory-storage
    • 1024 CPU units, 1962MB mem
  • mod-source-record-storage
    • 1024 CPU units, 1440MB mem
  • mod-source-record-manager
    • 1024 CPU units, 3688MB mem
  • mod-data-import
    • 256 CPU units, 1844MB mem
  • mod-data-import-cs 
    • 128 CPU units, 896MB mem

Software versions

  • mod-data-import v2.6.1
  • mod-data-import-converter-storage v1.15.1
  • mod-source-record-manager v3.5.0
  • mod-source-record-storage v5.5.2
  • mod-inventory v19.0.1
  • mod-inventory-storage v25.0.1

Summary

  • Data Import in Nolana has more or less the same DI durations as Morning Glory. For instance it's +20 seconds for 10K creation, however it's - 40 s for updates, for 50K records it's +2 minutes on creation and -2 minutes for updates.
  • One issue was detected it's MODSOURMAN-908 .  This is deadlocks in database which make DI work slow (when issue happens on 50 K duration increases up to 6 hours).
    • After MODSOURMAN-908 was fixed - we were not able to reproduce this deadlock issue.
  • R/W Split Enabled:
    • For most of tests we can see an improvement of DI duration. For example 10K create with R/W split - 3m 43s, without R/W split it's 4m 55 s.
    • We can see that RDS CPU usage on writer node is even higher than it was without read/write split enable.
    • With R/W split for data import creates/updates - reader node took on 15-17% of DB load.
  • MARC BIB Update and Create take less time for Nolana with new version of DI modules*.  PERF-388 - Getting issue details... STATUS reproduced for 10k MARC BIB Update without Check-In/Check-Out. For job started after 5k MARC BIB Update with less than a minute timeout between jobs.

Results


Profile

Duration

Nolana with new version of modules*

Duration

Nolana

Duration

Morning Glory

Duration Lotus

1K MARC CreatePTF - Create 240 s46 s50 s1 min 9 s

1K MARC Update

PTF - Updates Success - 135 s50 s39s1 min 30 s
2K MARC CreatePTF - Create 256 s1 min 13 s1 min 2s1 min 34 s
2K MARC UpdatePTF - Updates Success - 158 s1 min 7 s1 min 11 s1 min 54 s

5K MARC Create

PTF - Create 22m 8 s2 min 51 s2 min 20s

3 min 54 s

5K MARC UpdatePTF - Updates Success - 12 min 10 s2 min 27s3 min 4 s4 min 12 s

10K MARC Create 

PTF - Create 24 min 20 s4 min 55 s4 min 33 s

6 min 45 s

10K MARC Update PTF - Updates Success - 14 min 8 s4 min 50 s5 min 29 s8 min 4 s
25K MARC CreatePTF - Create 210 min 41 s11 min 56 s10 min 55 s16 min 8s
25K MARC UpdatePTF - Updates Success - 110 min 40 s12 min 20 s13 min 37 s19 min 50s
50K MARC CreatePTF - Create 221 min 11 s23 min 43 s21 min 37 s32 min 28 s

50K MARC Update

PTF - Updates Success - 120 min 57 s24 min 5 s26 min 10 s39 min 5 s
100K MARC CreatePTF - Create 242 min 35 s49 min 40 s44 min 4 s1 hr 11 min

100K MARC Update

PTF - Updates Success - 141 min 56 s51 min 15 s55 min 33 s1 hr 19 min
500K MARC CreatePTF - Create 2DNR

4 hr 2 min

Completed with errors*

3 hr 55 min

Completed with errors*

7 hr 4 min (Completed with errors)*
  • 500K MARC Create import failed due to 500K records file corruption. 
  • So far we can only compare results of PTF-Create-2 job profile, while Update-success-2 is not available on our Morning-Glory env. 

* - Modules and versions comparison table

Nolananew versionold version
mod-data-import2.6.22.6.1
mod-data-import-converter-storage1.15.21.15.1
mod-source-record-manager 3.5.63.5.0
mod-source-record-storage5.5.25.5.2
mod-inventory19.0.219.0.1
mod-inventory-storage25.0.325.0.1


Resources usage MARC Create