SPIKE - Design approach for differentiating incoming MARC Bibs that should/shouldn't be saved in SRS

Spike Overview

Jira link: MODDATAIMP-744 - Getting issue details... STATUS

Spike Status: COMPLETED 

Objective: Investigate if DI flow can be changed to remove the first step of saving incoming records in SRS, define how and when SRS records should be created and persisted in SRS, reflect changes in sequence diagrams, create stories for refactoring.

Background and Problem Statement

There is no explicit action to save the SRS MARC record, it is implicit and happens for each incoming file (with a couple of exceptions implemented as "bug fixes", ex. Job Profile contains action for Update Instance or Update Holdings). According to the original design, DI record from the incoming file is considered new and valid record that should be saved prior to any other actions and serve as a single source of truth. In fact, there are indeed scenarios where records that are coming should be saved in SRS and referenced by other entities that are derived from it. However, there are also multiple use cases (usually some kind of updates or creates on Holdings and/or Item, creating Orders and Invoices), where incoming record is considered to be disposable, it might contain only partial data, and if it is saved we end up either with lost data (when original record is overridden) or with messed up links to corresponding inventory entities (when we save the record as new one).

In addition, SRS contains a lot of clutter - records that are not used after import is completed, as well as broken records that are not linked to any FOLIO entity as a result of failed imports from the past. Post-processing mechanism is redundant and can be avoided if prior saving of the records is not mandatory and the problem of generation identifier for Instances (Holdings and Authority?) is resolved. Removing the mandatory step of saving the MARC in SRS prior to any other actions would significantly simplify the DI flow. Stated problems if addressed would lead to improvements in DI performance - create scenarios would be simplified, update scenarios should benefit from quicker search if SRS DB is not piling up clutter.

Scope

In Scope

Main focus is on importing MARC Bibs, but flows for MARC Holdings and MARC Authority should also be reviewed and either left as is for now, or same changes applied as for MARC Bibs (if applicable). 

Out of Scope

Cleaning up SRS DB from clutter is out of scope. Deleting OLD and broken records should be addressed in other spike. Piling up of OLD records as a result of multiple updates on SRS MARC and the overall versioning mechanism is our of scope of this spike.

Research Questions

  1. What DI scenarios require saving SRS MARC Bib other than Create Instance action? 
  2. What if Save batches of incoming records are not saved in SRS by default? Incoming MARC json will not be accessible in the DI logs, but we could save parsed content in SRM as part of a journal record.
  3. If SRS MARC record is saved in SRS only as an implicit part of Create Instance action should it happen before or after Creating of the Instance? If before - can Instance UUID and HRID be generated prior to creating the Instance to get rid of Post-Processing step. At what point manipulations on 001+003 → 035 fields should be done? SRM before Saving of the record in SRS or before mapping of Instance in Create Instance action?
  4. How MARC Modify action should fit in the changed flow?
  5. MARC Holdings and MARC Authority - should it follow the same flow as MARC Bib?
  6. Linking MARC Bibs and Authority - any adjustments/risks there?

Deliverables

Updated diagrams of DI flow for main scenarios. New feature and stories for refactoring in Jira.

Simplified* diagram of DI flow for creating SRS MARC Bib and Instance

Source

*Error handling is omitted, DI_ERROR event can be sent at any step in case of errors.

Option 1

 Click here to expand...

Remove step when initial records are saved in SRS (in batches).

Save incoming parsed content in SRM (it will be required for DI log) - it should be cleaned up when JobExecution is deleted

Provide endpoint in mod-inventory-storage(?) to generate ids for the Instance (Holdings/Authority?) prior to creating those records in inventory (in batches?)

Revisit 001 + 003 → 035 logic

Save MARC (Edifact records are also not needed in SRS) in SRS only as implicit part of Create instance (Holdings/Authority?) action. Save one by one?

Move on to inventory - basically finish the action there. Post-processing won't be required as ids are generated already, and underlying MARC contains them.


Pros

  • Simplified flow, removed Post-Processing step for Create Instance action (see updated diagram below)
  • Declutter SRS (incoming records for the jobs that do not require SRS MARC to be created and linked with other entities, will not be saved)

Source

Cons

  • Need to generate inventory identifiers (reserve the hrid sequence) before creating inventory entities (step 16 on the diagram)
  • Allow saving Inventory entities with already assigned identifiers (uuid and hrid), step 25
  • For other flows (that do not require saving SRS MARC records) we'll need to add storage for initial records to be referenced in DI logs

Option 2 - SELECTED

Remove step when initial records are saved in SRS (in batches).

Save incoming parsed content in SRM (it will be required for DI log) - it should be cleaned up when JobExecution is deleted

Move on to the action in profile. Create MARC would an implicit step for Create Instance action

001 + 003 → 035 logic should be done in mod-inventory before creating Instance

After inventory entity is created -  make a post request to SRS instead of sending the message


Pros

  • Straightforward decision about action in SRM - following the profile (see diagram below)
  • Simplified flow for scenarios that do need the SRS entity to be created
  • No clutter in SRS (incoming records for the jobs that do not require SRS MARC to be created and linked with other entities, will not be saved)
  • Removed post-processing step
  • Performance benefits

Source

Cons

  • Error handling - in case MARC Bib was not created, we end up with Instance record (source=MARC), but no underlying SRS MARC. Need to make sure such Instance is editable 
  • Store all incoming records in SRM to be referenced from DI logs

Option 3

 Click here to expand...

Make Create SRS MARC action explicit - either additional action in the profile, or a checkbox - Save (or not) SRS MARC Bib

Pros

Shifting decision making on the user side

Straightforward flow

Cons

Error prone - we'll have to validate profiles thoroughly, shifting responsibility to the step of profile creation basically hardcoding the same principles, but earlier 


Conclusion

Option 2 should simplify the DI flow significantly, prevent accumulating clutter in SRS, allow to remove the post-processing step Create Instance (Holdings, Authority) action, and overall improve performance of DI.

Implementation stories

MODSOURMAN-1019 - Getting issue details... STATUS

MODINV-849 - Getting issue details... STATUS

MODINV-850 - Getting issue details... STATUS

MODSOURCE-672 - Getting issue details... STATUS

MODSOURMAN-1020 - Getting issue details... STATUS

MODSOURMAN-1021 - Getting issue details... STATUS

MODSOURMAN-1023 - Getting issue details... STATUS

MODSOURMAN-1022 - Getting issue details... STATUS

MODINV-850 - Getting issue details... STATUS