Skip to end of banner
Go to start of banner

Data Import NFR

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues. 

Identified problems:

Data Import Issues and possible improvements (WIP)

SPIKE: Multiple tenant DI testing - import jobs are hanging (MODSOURCE-581)

FOLIO Production Library Import Statistics

Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb

Possible improvements:

  1. Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy MODSOURCE-601 - Getting issue details... STATUS
    • Implementation
    • Some additional changes and merging
    • Testing: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min)
  2. Enable R/W split for mod-source-record-manager MODSOURMAN-966 - Getting issue details... STATUS
    • Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operations
    • Implement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in Poppy
    • Test to make sure there are no other places that could be affected by R/W split
    • Investigate more thoroughly and define more elaborate solution
  3. Configure external storage for uploading files and schedule its processing ARCH-19 - Getting issue details... STATUS MODDATAIMP-392 - Getting issue details... STATUS  
  4. Refactor mod-source-record-storage to move away from storing data in jsonb MODSOURCE-627 - Getting issue details... STATUS
  5. Optimize DI import flow (possibly remove post-processing step for Create Instance action) MODDATAIMP-744 - Getting issue details... STATUS
    • Explore possibility to remove post-processing for Create Instance action - to be done in Poppy
    • Define approach
    • Create and estimate implementation stories
    • Implement
  6. Create indexes for 010 and 035 MARC fields in mod-source-record-storage
    • Implementation done in Orchid 
      • Notes: This change has impact on Update jobs with MARC-to-Marc matching on 010 or 035 fields
  7. Configure MAX connection pool size and timeout







  • No labels