Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device.
Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
/
Data Import Issues and possible improvements (WIP)
Data Import Issues and possible improvements (WIP)
Oct 11, 2023
Steps
Gather and group existing issues (@Kateryna Senchenko, @Olamide Kolawole)
Create new Spikes and features for possible solutions (@Kateryna Senchenko, @Olamide Kolawole)
Review with Spitfire to ensure no questions or issues in relation to their Data Import work (@Kateryna Senchenko, @Ann-Marie Breaux (Deactivated) , @Khalilah Gambrell , @Pavlo Smahin )
Provide feature dependencies (@Kateryna Senchenko, @Olamide Kolawole)
Align to timeline, and assign to appropriate Jira Feature, and review Jira issue priorities (@Kateryna Senchenko, @Ivan Kryzhanovskyi, @Ann-Marie Breaux (Deactivated))
Priorities
High, Med, Low
Complexity
S, M, L, XL, XXL
Problem definition
Business impact
Steps (Proposed Solution)
Priority
Complexity
Existing Jira issues
Comments
Problem definition
Business impact
Steps (Proposed Solution)
Priority
Complexity
Existing Jira issues
Comments
1
DI relies on internal identifiers for SRS records
DI does not support differentiation of records based on external identifiers (ISBN or barcode numbers).
The criteria that we have to distinguish whether MARC record already exists in SRS or not is UUID stored in 999 ff. If incoming record has no 999 ff field we consider it as new, save it and assign new UUID. If 999 ff field is present - we increment the generation and save it as a new and actual version of the record. The problem is that sometimes incoming records does not have 999 ff field even though they already exist in SRS and have corresponding inventory instances linked.
Gather requirements - which fields of the incoming MARC Bib contain external identifiers, what about MARC Holdings and MARC Authority (should we make changes for them too?)
Design new mechanism for versioning of records - based on external identifiers
Consider performance implications
Decide what to do with duplicates that already exist in SRS
DI profile actions can sometimes lead to other actions that are implicit / disposable incoming records
We don’t have an explicit action to save the SRS MARC record, it is implicit and happens (almost… we already have a couple of exceptional cases, which were added later as “bug fixes”) for each incoming file. When it was designed we thought of an incoming file as a new and valid record that should be saved prior to any other actions and serve as a single source of truth. In fact, what we have now - there are indeed records that are coming and should be saved in SRS and referenced by other entities that are derived from it. However, there are also multiple use cases (usually some kind of updates or creates on Holdings and/or Item), where incoming MARC record is considered to be disposable, it might contain only partial data, and if we save it we end up either with lost data (when original record is overridden) or with messed up links to corresponding inventory entities (when we save the record as new one)
Consider making Create/Update SRS MARC Bib explicit - a separate step in the profile
Alternatively some kind of a check box should be added when profile is constructed specifying whether MARC Bib is supposed to be saved or not
Performance results in production environments is different than on PTF
Slower imports for big files, especially for Updates
Gather information on configs and amount of allocated resources for DI modules, background activity, other factors
It might be caused by a complexity of the profile, complex matching conditions. Get examples of profiles and maybe files to try them on PTF env and compare with results measured with our base cases
This point refers to post processing step when Instance is created (see the linked issue). Post processing during importing Orders is a separate topic.
8
Remove incomplete data import job monitoring process from mod-source-record-manager. Or implement a working monitoring if there is a business. We are currently incurring the cost without the benefits.
Increases communication to the database for every kafka event which has downstream effects on data import performance, database utilization & database storage.
Update job_execution_progress table in mod-source-record-manager without doing a SELECT FOR UPDATE, then an update. The row is locked after SELECT FOR UPDATE which is causing contention for multiple SRM instances.
Data Import Processing Core needs to be refactored. This refactoring should allow a clear and concise API that FOLIO developers in other module areas can hook into data import system cleanly. For example, Inventory mapping should be stored in mod-inventory instead of data import processing core.
mod-data-import can only have one instance in a folio cluster due to its interaction with file storage. This has caused responsibilities it may have had to be moved to mod-source-record-manager.
limited availability for API endpoints serviced by mod-data-import
Generic backend error messages are returned to the user upon failures in data import. Data Import should employ error codes and specific error messages for issues that occur frequently.
Troubleshooting is harder for data import users as well as developers.
Sometimes functional/performance issues occur in production environments that are not easily reproducible in lower environments. Having job profiles, import files from prod that are executable in lower environment would be of great help.
Create tool that will allow import/export of data import profiles.