Data Import (Iris)

Data Import (Iris)

Overview

Compared to Honeysuckle, Data Import in Iris is more robust, predictable, and reliable.  Up to 50K MARC records can be imported reliably in a reasonable 2 hours, and up to 100K records may be imported in less than 8 hours. These imports create new instances, holdings, and items. Tests were performed on imports that UPDATE records but there are still ongoing issues in this workflow. 

Environment

The following combination of hardware specs and software configuration on which data import tests were successfully performed.

No. Instances

Container

CPU Units (AWS)

Soft Memory Limit

Hard Memory Limit

No. Instances

Container

CPU Units (AWS)

Soft Memory Limit

Hard Memory Limit

1

mod-data-import v2.0.2

128

1024 MB

2048 MB

2

mod-source-record-manager v3.0.5

128

  896 MB

1440 MB

2

mod-source-record-storage v5.1.0-Snapshot

128

  896 MB

1440 MB

2

mod-inventory v16.4.0-Snapshot

256

1440 MB

1872 MB

2

mod-inventory-storage v20.3.0-Snapshot

128

  536 MB

  864 MB

Environment Variables

mod-inventory were configured with the following variables:

  • inventory.kafka.DataImportConsumerVerticle.instancesNumber=10

  • inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10

AWS MSK Configurations:

  • v2.7.0

  • 2 Brokers: kafka.m5.large, 500 GiB each

  • auto.create.topics.enable=true

  • log.retention.minutes=300

  • Data Import Topics

    • Number of partitions: 1

    • Replication Factor = 2

  • mod-inventory's topics (inventory.instance, inventory.holdings-record, inventory.item)

    • Number of partitions: 50

    • Replication Factor = 2

Troubleshooting