Data Import NFR

Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues. 

Identified problems:

Data Import Issues and possible improvements (WIP)

SPIKE: Multiple tenant DI testing - import jobs are hanging (MODSOURCE-581)

FOLIO Production Library Import Statistics

Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb

Possible improvements:

  1. Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy MODSOURCE-601 - Getting issue details... STATUS
    • Implementation
    • Some additional changes and merging
    • Testing: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min)

                                                   

  2. Enable R/W split for mod-source-record-manager MODSOURMAN-966 - Getting issue details... STATUS
    • Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operations
    • Implement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in Poppy
    • Test to make sure there are no other places that could be affected by R/W split
    • Investigate more thoroughly and define more elaborate solution
  3. Configure external storage for uploading files and schedule its processing ARCH-19 - Getting issue details... STATUS MODDATAIMP-392 - Getting issue details... STATUS  
  4. Refactor mod-source-record-storage to move away from storing data in jsonb MODSOURCE-627 - Getting issue details... STATUS
  5. Optimize DI import flow (possibly remove post-processing step for Create Instance action) MODDATAIMP-744 - Getting issue details... STATUS
    • Explore possibility to remove post-processing for Create Instance action - to be done in Poppy
    • Define approach
    • Create and estimate implementation stories
    • Implement
  6. Create indexes for 010 and 035 MARC fields in mod-source-record-storage
    • Implementation done in Orchid 
      • Notes: This change has impact on Update jobs with MARC-to-Marc matching on 010 or 035 fields
  7. Configure MAX connection pool size and timeout