Skip to end of banner
Go to start of banner

Data Import Observations for Improvements

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This page is used to brainstorm observations and issues with the current implementation of Iris (as of Hotfix 1).

Stability

  • Huge update jobs often stuck at 99%.  It has been observed that mod-inventory crashed several times due to OOM while the job is being run. 
    • Could be due to mod-inventory's memory's usage?
    • Could be due to under-deployed brokers?
  • While an import is running, if a module (mod-srm/srs or mod-inventory) gets restarted for any reason, the job may not finish with all records created as expected.
  • The jobs get stuck intermittently and require restarting all DI modules.  Don't know the cause.  

Scalability/Performance

  • Currently one CREATE import job of 50K records can run reliably, takes around 2 hours. Multiple CREATE jobs may be run but depending on the timing of when the second (or n-th job) starts, the first job is slowed down and the second or n-th job takes a very long time to get started and finishes. This is because of the overwhelming number of messages that are queued up in the Kafka topics that the first job created.  
    • Consequently running concurrently DI jobs in a multi-tenants cluster is almost impossible when the number of records per each job are tens of thousands. 
    • To date 100K CREATE imports have not been done successfully, either more or less records than anticipated have been created.
  • Current hardware's resources cannot accommodate both DI and circulation load successfully.
    • During CREATE imports, CPU utilization % of mod-srm is 600%, mod-srs is 400%, mod-inventory is over 250%. 
    • During UPDATE imports, CPU utilization % of mod-srm is 500%, mod-srs is 400%, mod-inventory for a long duration around 700%.
    • As a result circulation activities' response times are increased by 1/3.
  • Polling mechanism to get statuses on the DI landing page is slow
    • Polling mechanism executes a slow query on the DB side. Query needs to improve.
    • Polling may not be necessary if we move to the push model with websocket. 

Functionality

  • It is very time-consuming and requires a lot of manual effort to clean up old Kafka topics after upgrading DI modules to the new FOLIO release. Migration script to clean up old topics will greatly enhance upgrades.
  • No labels