Data Import Test Report (Kiwi)
It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.
Overview
This document contains the results of testing Data Import in Kiwi, and compare their results against Hot Fix 3's results to detect performance trends.
Infrastructure
- 6 m5.xlarge EC2 instances
- 2 instances of db.r6.xlarge database instances, one reader and one writer
- MSK
- 4 m5.2xlarge brokers in 2 zones
- auto.create-topics.enable = true
- log.retention.minutes=120
- mod-inventory memory
- 256 CPU units, 1814MB mem
- inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
- inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
- kafka.consumer.max.poll.records=10
- mod-inventory-storage
- 128 CPU units, 544MB mem
- mod-source-record-storage
- 128 CPU units, 908MB mem
- mod-source-record-manager
- 128 CPU units, 1292MB mem
- mod-data-import
- 128 CPU units, 1024MB mem
Software versions
- mod-data-import v2.2.0
- mod-source-record-manager v3.2.3
- mod-source-record-storage v5.2.1
- mod-inventory v18.0.1
- mod-inventory-storage v22.0.1
Results
Import durations are recorded in the following table. Note that we did a couple of imports and their durations are recorded here for completeness sake, separated by a comma.
Import Durations
Profile | KIWI | KIWI (with OL) | |
---|---|---|---|
5K MARC Create | PTF - Create 2 | 5 min, 8 min | 8 min |
5K MARC Update | PTF - Updates Success - 1 | 11 min, 13 min | 6 min |
10K MARC Create | PTF - Create 2 | 11 min , 14 min | 12 min |
10K MARC Update | PTF - Updates Success - 1 | 22 min, 24 min | 15 min |
20K MARC Create | PTF - Create 2 | 20 min, 28 min | 21 min |
20K MARC Update | PTF - Updates Success - 1 | 43 min, 50 min | 27 min |
25K MARC Create | PTF - Create 2 | 23 mins, 25 mins, 26 mins | 24 min |
25K MARC Update | PTF - Updates Success - 1 | 1 hour 20 mins (completed with errors) *, 56 mins | 40 min |
50K MARC Create | PTF - Create 2 | Completed with errors, 1 hr 40 mins | 43 min |
50K Update | PTF - Updates Success - 1 | 2 hr 32 mins (job stuck at 76% completion) | 1hr 4min |
*=worked with faulty MARC file from the 25K Create import (which did not finish successfully). This test is to gauge how long a successful 25k UPDATE import might take.
High Level Summary
- Consistent CREATE and UPDATE imports were achieved for up to 25K MARC files. The import duration were consistent as well. All expected SRS, instances, holdings, and items were created.
- No CREATE nor UPDATE import was achieved with 25K MARC files due to various errors. See
-
MODSOURCE-417Getting issue details...
STATUS
- *Update 12/1 - Further testing shows that even with a 5K import the issues in - MODSOURCE-417Getting issue details... STATUS prevented CREATE imports from creating all expected holdings and items.
- After restarting ALL DI modules DI jobs completed successfully consistently again.
- *Update 1/10/22 - Further testing shows that SQL errors - MODSOURCE-438Getting issue details... STATUS also kills the current job and fails subsequent jobs, create or updates.
- The events_cache topic is (still) present in Kiwi and whenever it spikes for whatever reasons, it disrupts data import leading to a prolonged imports that rarely completed without errors. See - MODINV-444Getting issue details... STATUS which was created when observed this behavior in Iris/Juniper releases.
- With Optimistic Locking enabled, import jobs seemed to finish faster than without, especially with updates.
Resource Usages
For the most part the CPU utilization picture is consistent among various CREATE imports. The following describe "spikes" which took place in the first 10 minutes of the imports.
- mod-source-record-manager spikes up to around 500% for CREATE imports of up to 20K records, and 800% up to 25K records. The more records the harder mod-source-record-manager works.
- mod-source-record-storage spikes up to around 400% for up to 20K CREATE, 570% for 25K
- mod-inventory hovers around 300% for all CREATE jobs, up to 25K
- mod-inventory-storage uses about 200%, rising gradually over the duration of the import.
- mod-data-import-cs spikes around 400% but averages 140%.
- mod-data-import has a quick spike for 80%.
5K-10K Imports
EC2 Instance CPU Utilization
We can see here that the EC2 instances approached 80% CPU utilization during the initial 10 minutes spikes and averaged about 70%.
Service Memory Usage
Memory is stable during short/small-dataset imports.