Data Import NFR

Data Import NFR

Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues. 

Identified problems:

Data Import Issues and possible improvements (WIP)

SPIKE: Multiple tenant DI testing - import jobs are hanging (MODSOURCE-581)

FOLIO Production Library Import Statistics

Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb

Possible improvements:

  1. Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy https://folio-org.atlassian.net/browse/MODSOURCE-601

    Implementation
    Some additional changes and merging

    Testing: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min)

                                                 

  2. Enable R/W split for mod-source-record-manager https://folio-org.atlassian.net/browse/MODSOURMAN-966

    Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operations
    Implement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in Poppy
    Test to make sure there are no other places that could be affected by R/W split

    Investigate more thoroughly and define more elaborate solution

  3. Configure external storage for uploading files and schedule its processing https://folio-org.atlassian.net/browse/ARCH-19 https://folio-org.atlassian.net/browse/MODDATAIMP-392 

    Define approach
    Create and estimate implementation stories

    Implement

  4. Refactor mod-source-record-storage to move away from storing data in jsonb https://folio-org.atlassian.net/browse/MODSOURCE-627

    Define approach
    Create and estimate implementation stories
    Implement

    Test

  5. Optimize DI import flow (possibly remove post-processing step for Create Instance action)https://folio-org.atlassian.net/browse/MODDATAIMP-744

    Explore possibility to remove post-processing for Create Instance action - to be done in Poppy
    Define approach
    Create and estimate implementation stories

    Implement

  6. Create indexes for 010 and 035 MARC fields in mod-source-record-storage

    Implementation done in Orchid

     

  7. Configure MAX connection pool size and timeout