Data Import NFR
Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues.
Identified problems:
Data Import Issues and possible improvements (WIP)
SPIKE: Multiple tenant DI testing - import jobs are hanging (MODSOURCE-581)
FOLIO Production Library Import Statistics
Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb
Possible improvements:
- Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy
-
MODSOURCE-601Getting issue details...
STATUS
- Implementation
- Some additional changes and merging
Testing: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min)
- Enable R/W split for mod-source-record-manager
-
MODSOURMAN-966Getting issue details...
STATUS
- Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operations
- Implement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in Poppy
- Test to make sure there are no other places that could be affected by R/W split
- Investigate more thoroughly and define more elaborate solution
- Configure external storage for uploading files and schedule its processing
-
ARCH-19Getting issue details...
STATUS
-
MODDATAIMP-392Getting issue details...
STATUS
- Define approach
- Create and estimate implementation stories
- Implement
- Refactor mod-source-record-storage to move away from storing data in jsonb
-
MODSOURCE-627Getting issue details...
STATUS
- Define approach
- Create and estimate implementation stories
- Implement
- Test
- Optimize DI import flow (possibly remove post-processing step for Create Instance action)
-
MODDATAIMP-744Getting issue details...
STATUS
- Explore possibility to remove post-processing for Create Instance action - to be done in Poppy
- Define approach
- Create and estimate implementation stories
- Implement
- Notes: 1. Create MARC Bib, create Instance, Holdings, Item - see diagram, steps 25-30 can be simplified potentially
- Create indexes for 010 and 035 MARC fields in mod-source-record-storage
- Implementation done in Orchid
- Notes: This change has impact on Update jobs with MARC-to-Marc matching on 010 or 035 fields
- Implementation done in Orchid
- Configure MAX connection pool size and timeout
- Implementation done in Orchid
- Notes: DB is identified as a bottleneck for the Data Import process, increasing number of connections allows importing large files without issues. Recommended Maximum File Sizes and Configuration
- Implementation done in Orchid