Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues. 

Identified problems:

Data Import Issues and possible improvements (WIP)

...

Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb

Possible improvements:

  •  Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODSOURCE-601
    •  Implementation
    •  Some additional changes and merging
  •  Testing: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min) Image Modified

...

  •  Enable R/W split for mod-source-record-manager
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODSOURMAN-966
    •  Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operations
    •  Implement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in Poppy
    •  Test to make sure there are no other places that could be affected by R/W split
    •  Investigate more thoroughly and define more elaborate solution
  •  Configure external storage for uploading files and schedule its processing
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyARCH-19
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-392
     
  •  Refactor mod-source-record-storage to move away from storing data in jsonb
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODSOURCE-627
  •  Optimize DI import flow (possibly remove post-processing step for Create Instance action)
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODDATAIMP-744
    •  Explore possibility to remove post-processing for Create Instance action - to be done in Poppy
    •  Define approach
    •  Create and estimate implementation stories
    •  Implement
  •  Create indexes for 010 and 035 MARC fields in mod-source-record-storage
    •  Implementation done in Orchid 
      • Notes: This change has impact on Update jobs with MARC-to-Marc matching on 010 or 035 fields
  •  Configure MAX connection pool size and timeout

...