Background: Imports of MARC Bib files in production take much longer than expected, sometimes hang or fail with errors indicating infrastructure issues.
Identified problems:
Data Import Issues and possible improvements (WIP)
...
Main problems to be addressed: pressure on DB during large imports or multiple imports in parallel (multi-tenant setup), storing data in jsonb
Possible improvements:
...
- Optimize Insert & Update of marc_records_lb in mod-source-record-storage - to be finished in Poppy
Jira Legacy server System
...
JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODSOURCE-601 - Implementation
- Some additional changes and merging
Testing: preliminary results - basic import of 50k MARC Bibs + Create Inventory Instances takes 34 min (Orchid results - 54 min)
- Enable R/W split for mod-source-record-manager
Jira Legacy server System
...
JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODSOURMAN-966 - Preliminary investigation: Synchronization of read instance with main is not fast enough causing errors for single record operations
- Implement a short-term solution: rewrite request for get job execution by id to go to a main instance of the DB - can be done in Poppy
- Test to make sure there are no other places that could be affected by R/W split
- Investigate more thoroughly and define more elaborate solution
...
- Configure external storage for uploading files and schedule its processing
Jira Legacy server System
...
JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key ARCH-19 Jira Legacy server System
...
JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-392 - Define approach
- Create and estimate implementation stories
- Implement
...
- Refactor mod-source-record-storage to move away from storing data in jsonb
Jira Legacy server System
...
JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODSOURCE-627 - Define approach
- Create and estimate implementation stories
- Implement
- Test
...
- Optimize DI import flow (possibly remove post-processing step for Create Instance action)
Jira Legacy server System
...
JIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODDATAIMP-744 - Explore possibility to remove post-processing for Create Instance action - to be done in Poppy
- Define approach
- Create and estimate implementation stories
- Implement
- Notes: 1. Create MARC Bib, create Instance, Holdings, Item - see diagram, steps 25-30 can be simplified potentially
...
- Create indexes for 010 and 035 MARC fields in mod-source-record-storage
- Implementation done in Orchid
- Notes: This change has impact on Update jobs with MARC-to-Marc matching on 010 or 035 fields
- Implementation done in Orchid
...
- Configure MAX connection pool size and timeout
- Implementation done in Orchid
- Notes: DB is identified as a bottleneck for the Data Import process, increasing number of connections allows importing large files without issues. Recommended Maximum File Sizes and Configuration
- Implementation done in Orchid
...