Data Import Test Report (Kiwi)
It's been found after testing that the actual durations of the imports performed were about 2 (two) times longer than what was reported. This is due to the PTF environment missing a DB trigger that, when restored, doubled the imports' durations.
Overview
This document contains the results of testing Data Import in Kiwi, and compare their results against Hot Fix 3's results to detect performance trends.
Infrastructure
- 6 m5.xlarge EC2 instances
- 2 instances of db.r6.xlarge database instances, one reader and one writer
- MSK
- 4 m5.2xlarge brokers in 2 zones
- auto.create-topics.enable = true
- log.retention.minutes=120
- mod-inventory memory
- 256 CPU units, 1814MB mem
- inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
- inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
- kafka.consumer.max.poll.records=10
- mod-inventory-storage
- 128 CPU units, 544MB mem
- mod-source-record-storage
- 128 CPU units, 908MB mem
- mod-source-record-manager
- 128 CPU units, 1292MB mem
- mod-data-import
- 128 CPU units, 1024MB mem
Software versions
- mod-data-import v2.2.0
- mod-source-record-manager v3.2.3
- mod-source-record-storage v5.2.1
- mod-inventory v18.0.1
- mod-inventory-storage v22.0.1
Results
Import durations are recorded in the following table. Note that we did a couple of imports and their durations are recorded here for completeness sake, separated by a comma.
Import Durations
Profile | KIWI | KIWI (with OL) | |
---|---|---|---|
5K MARC Create | PTF - Create 2 | 5 min, 8 min | 8 min |
5K MARC Update | PTF - Updates Success - 1 | 11 min, 13 min | 6 min |
10K MARC Create | PTF - Create 2 | 11 min , 14 min | 12 min |
10K MARC Update | PTF - Updates Success - 1 | 22 min, 24 min | 15 min |
20K MARC Create | PTF - Create 2 | 20 min, 28 min | 21 min |
20K MARC Update | PTF - Updates Success - 1 | 43 min, 50 min | 27 min |
25K MARC Create | PTF - Create 2 | 23 mins, 25 mins, 26 mins | 24 min |
25K MARC Update | PTF - Updates Success - 1 | 1 hour 20 mins (completed with errors) *, 56 mins | 40 min |
50K MARC Create | PTF - Create 2 | Completed with errors, 1 hr 40 mins | 43 min |
50K Update | PTF - Updates Success - 1 | 2 hr 32 mins (job stuck at 76% completion) | 1hr 4min |
*=worked with faulty MARC file from the 25K Create import (which did not finish successfully). This test is to gauge how long a successful 25k UPDATE import might take.
High Level Summary
- Consistent CREATE and UPDATE imports were achieved for up to 25K MARC files. The import duration were consistent as well. All expected SRS, instances, holdings, and items were created.
- No CREATE nor UPDATE import was achieved with 25K MARC files due to various errors. See
-
MODSOURCE-417Getting issue details...
STATUS
- *Update 12/1 - Further testing shows that even with a 5K import the issues in - MODSOURCE-417Getting issue details... STATUS prevented CREATE imports from creating all expected holdings and items.
- After restarting ALL DI modules DI jobs completed successfully consistently again.
- *Update 1/10/22 - Further testing shows that SQL errors - MODSOURCE-438Getting issue details... STATUS also kills the current job and fails subsequent jobs, create or updates.
- The events_cache topic is (still) present in Kiwi and whenever it spikes for whatever reasons, it disrupts data import leading to a prolonged imports that rarely completed without errors. See - MODINV-444Getting issue details... STATUS which was created when observed this behavior in Iris/Juniper releases.
- With Optimistic Locking enabled, import jobs seemed to finish faster than without, especially with updates.
Resource Usages
For the most part the CPU utilization picture is consistent among various CREATE imports. The following describe "spikes" which took place in the first 10 minutes of the imports.
- mod-source-record-manager spikes up to around 500% for CREATE imports of up to 20K records, and 800% up to 25K records. The more records the harder mod-source-record-manager works.
- mod-source-record-storage spikes up to around 400% for up to 20K CREATE, 570% for 25K
- mod-inventory hovers around 300% for all CREATE jobs, up to 25K
- mod-inventory-storage uses about 200%, rising gradually over the duration of the import.
- mod-data-import-cs spikes around 400% but averages 140%.
- mod-data-import has a quick spike for 80%.
5K-10K Imports
EC2 Instance CPU Utilization
We can see here that the EC2 instances approached 80% CPU utilization during the initial 10 minutes spikes and averaged about 70%.
Service Memory Usage
Memory is stable during short/small-dataset imports.
RDS CPU Utilization
Spikes approaching 90%, averaging around 70%.
20K Imports
Database
25K CREATE Import
In the 50k import we see a flat stepped load across instances like in 25K test, in shape corresponding to the dynamic load distribution of java VM
MSK/Kafka Resources
25K UPDATE import
EC2 Instance CPU Utilization
No visible memory issues during the update
Database Metrics
No unusual CPU or memory utilization spikes seen during the 25K UPDATE imports.
MSK/Kafka Resources
The following graphs show Kafka topics messages' rates. Note that events_cache spiked at toward the end of the import and dwarfs other topics' rates. This disruptive spike also corresponds to the spike by mod-inventory toward the end of the import. See - MODINV-444Getting issue details... STATUS for more details on general disruptiveness of such spikes.
Without events_cache spike there is usually about 100-200 messages per second among the topics.
Other Observations
There were times when the CREATE jobs finished "too soon" when the jobs were officially marked as "Completed with errors" when in fact there were no errors but the expected counts of instances, holdings, and item records did not meet the expected totals. However, in these instances just a little more time, about 1-5 minutes later, all records were created successfully and all the counts matched up to expected values. - MODSOURMAN-622Getting issue details... STATUS was created to address this issue.