Data Import (Iris)

Overview

Compared to Honeysuckle, Data Import in Iris is more robust, predictable, and reliable.  Up to 50K MARC records can be imported reliably in a reasonable 2 hours, and up to 100K records may be imported in less than 8 hours. These imports create new instances, holdings, and items. Tests were performed on imports that UPDATE records but there are still ongoing issues in this workflow. 

Profiles used for PTF testing

PTF Test Results

Environment

The following combination of hardware specs and software configuration on which data import tests were successfully performed.

No. InstancesContainerCPU Units (AWS)Soft Memory LimitHard Memory Limit
1mod-data-import v2.0.21281024 MB2048 MB
2mod-source-record-manager v3.0.5128  896 MB1440 MB
2mod-source-record-storage v5.1.0-Snapshot128  896 MB1440 MB
2mod-inventory v16.4.0-Snapshot2561440 MB1872 MB
2mod-inventory-storage v20.3.0-Snapshot128  536 MB  864 MB

Environment Variables

mod-inventory were configured with the following variables:

  • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
  • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10

AWS MSK Configurations:

  • v2.7.0
  • 2 Brokers: kafka.m5.large, 500 GiB each
  • auto.create.topics.enable=true
  • log.retention.minutes=300
  • Data Import Topics
    • Number of partitions: 1
    • Replication Factor = 2
  • mod-inventory's topics (inventory.instance, inventory.holdings-record, inventory.item)
    • Number of partitions: 50
    • Replication Factor = 2

Troubleshooting