This page is used to brainstorm observations and issues with the current implementation of Iris (as of Hotfix 1).
Stability
- Huge update jobs often stuck at 99%. It has been observed that mod-inventory crashed several times due to OOM while the job is being run.
- Could be due to mod-inventory's memory's usage?
- Could be due to under-deployed brokers?
- While an import is running, if a module (mod-srm/srs or mod-inventory) gets restarted for any reason, the job may not finish with all records created as expected.
- The jobs get stuck intermittently and require restarting all DI modules. Don't know the cause.
Scalability/Performance
- Currently one CREATE import job of 50K records can run reliably, takes around 2 hours. Multiple CREATE jobs may be run but depending on the timing of when the second (or n-th job) starts, the first job is slowed down and the second or n-th job takes a very long time to get started and finishes. This is because of the overwhelming number of messages that are queued up in the Kafka topics that the first job created.
- Consequently running concurrently DI jobs in a multi-tenants cluster is almost impossible when the number of records per each job are tens of thousands.
- To date 100K CREATE imports have not been done successfully, either more or less records than anticipated have been created.
- Current hardware's resources cannot accommodate both DI and circulation load successfully.
- During CREATE imports, CPU utilization % of mod-srm is 600%, mod-srs is 400%, mod-inventory is over 250%.
- During UPDATE imports, CPU utilization % of mod-srm is 500%, mod-srs is 400%, mod-inventory for a long duration around 700%.
- As a result circulation activities' response times are increased by 1/3.
Functionality
- It is very time-consuming and requires a lot of manual effort to clean up old Kafka topics after upgrading DI modules to the new FOLIO release. Migration script to clean up old topics will greatly enhance upgrades.