Data Import NFR
Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues.
Identified problems:
Data Import Issues and possible improvements (WIP)
SPIKE: Multiple tenant DI testing - import jobs are hanging (MODSOURCE-581)
FOLIO Production Library Import Statistics
Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb
Possible improvements:
Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy https://folio-org.atlassian.net/browse/MODSOURCE-601
ImplementationSome additional changes and mergingTesting: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min)
Enable R/W split for mod-source-record-manager https://folio-org.atlassian.net/browse/MODSOURMAN-966
Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operationsImplement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in PoppyTest to make sure there are no other places that could be affected by R/W splitInvestigate more thoroughly and define more elaborate solution
Configure external storage for uploading files and schedule its processing https://folio-org.atlassian.net/browse/ARCH-19 https://folio-org.atlassian.net/browse/MODDATAIMP-392
Define approachCreate and estimate implementation storiesImplement
Refactor mod-source-record-storage to move away from storing data in jsonb https://folio-org.atlassian.net/browse/MODSOURCE-627
Define approachCreate and estimate implementation storiesImplementTest
Optimize DI import flow (possibly remove post-processing step for Create Instance action)https://folio-org.atlassian.net/browse/MODDATAIMP-744
Explore possibility to remove post-processing for Create Instance action - to be done in PoppyDefine approachCreate and estimate implementation storiesImplement
Create indexes for 010 and 035 MARC fields in mod-source-record-storage
Implementation done in Orchid
Configure MAX connection pool size and timeout