Data Import (Iris)
Overview
Compared to Honeysuckle, Data Import in Iris is more robust, predictable, and reliable. Up to 50K MARC records can be imported reliably in a reasonable 2 hours, and up to 100K records may be imported in less than 8 hours. These imports create new instances, holdings, and items. Tests were performed on imports that UPDATE records but there are still ongoing issues in this workflow.
Environment
The following combination of hardware specs and software configuration on which data import tests were successfully performed.
No. Instances | Container | CPU Units (AWS) | Soft Memory Limit | Hard Memory Limit |
---|---|---|---|---|
1 | mod-data-import v2.0.2 | 128 | 1024 MB | 2048 MB |
2 | mod-source-record-manager v3.0.5 | 128 | 896 MB | 1440 MB |
2 | mod-source-record-storage v5.1.0-Snapshot | 128 | 896 MB | 1440 MB |
2 | mod-inventory v16.4.0-Snapshot | 256 | 1440 MB | 1872 MB |
2 | mod-inventory-storage v20.3.0-Snapshot | 128 | 536 MB | 864 MB |
Environment Variables
mod-inventory were configured with the following variables:
- inventory.kafka.DataImportConsumerVerticle.instancesNumber=10
- inventory.kafka.MarcBibInstanceHridSetConsumerVerticle.instancesNumber=10
AWS MSK Configurations:
- v2.7.0
- 2 Brokers: kafka.m5.large, 500 GiB each
- auto.create.topics.enable=true
- log.retention.minutes=300
- Data Import Topics
- Number of partitions: 1
- Replication Factor = 2
- mod-inventory's topics (inventory.instance, inventory.holdings-record, inventory.item)
- Number of partitions: 50
- Replication Factor = 2