MODSOURCE-631 SPIKE: Define ways to clean up records created by a cancelled job

MODSOURCE-631 SPIKE: Define ways to clean up records created by a cancelled job

Ticket: MODSOURCE-631
Status: In Progress

Overview:

The spike investigated ways to address the problem of cleaning up records created or updated by data import jobs that were cancelled mid-process.

Data import is a distributed process across multiple modules, and cancellation can lead to several situations:

  1. Created Records: Valid and successfully created records, even if the job is marked as "Cancelled".

  2. Partially Processed Records: Some records from files being processed while others got skipped or were not linked correctly.

  3. Updated Records: Some records being updated, and others left untouched, leading to partial updates.

  4. Combined Jobs: Mixed cases involving created and updated records, resulting in even more inconsistency.

The decision considers the technical limitations, existing capabilities, and broader implications of any potential solution.


Decision:

Do not implement an automated solution to delete records created by canceled data import jobs. Possible approaches to cleaning up records would either be incomplete—resulting in more inconsistent data—or require significant architectural reengineering, which would have an enormous effect on the performance and overall maintainability of the system.

Instead, focus efforts on preventing canceled jobs from running unnecessary background processes, as planned under UXPROD-4704. Use existing manual scripts and processes to address inconsistencies on demand when necessary.


Justification:

  1. Complexity & Risk of Deletion for Created Records:

    • Records created by cancelled jobs can often be perfectly valid and may not require deletion. For example, instances where all records are created successfully but the job was cancelled due to user-specific reasons.

    • In scenarios where records were created some time ago, they may already have been updated, linked, or interacted with by other processes or users. Deleting such records could lead to more inconsistencies or data loss.

    • Records could be retrieved using the journal_records table in mod-source-record-manager, but this solution is better suited for immediate clean-up after cancellation. Older jobs might not be retrievable if logs are no longer available.

  2. Challenges with Rollback of Updates:

    • Rollback for partially or fully updated records is not feasible given the lack of current rollback mechanisms in FOLIO. Only mod-source-record-storage stores old record versions (subject to change under UXPROD-5505), whereas other modules lack historical data for reverting changes.

    • Implementing FOLIO-wide rollback functionality would require a significant architectural overhaul, will have significant impact on performance, and is out of the scope of this investigation.

  3. Inconsistencies for Combined Jobs:

    • In combined jobs, records might be both created and updated. Deleting only created records while leaving updated records intact would exacerbate inconsistencies rather than resolve them.

  4. Improved Prevention Strategy with UXPROD-4704:

    • The core issue of cancelled jobs creating unintended records will be addressed by UXPROD-4704, which aims to implement a mechanism to stop background processing upon job cancellation. This will minimize creation of unwanted or inconsistent data to begin with.

  5. Existing Manual Mitigation via Scripts:

    • For cases where inconsistent data is already present (from both fresh and older jobs), existing manual cleanup scripts are available. These scripts provide targeted solutions to remove or clean up specific types of records and can be executed on demand.

    • All available scripts are documented here: Scripts for Inventory, Source Record Storage, and Data Import Cleanup.


Final Recommendation:

Instead of building a complex and limited mechanism to delete records created by cancelled jobs, efforts should focus on:

  1. Ensuring UXPROD-4704 is implemented, which will minimize the number of unwanted records created by cancelled jobs in the future.

  2. Using documented manual scripts to handle inconsistencies caused by existing cancelled jobs on a case-by-case basis.

By preventing future data inconsistencies and managing existing issues reactively via scripts, we minimize disruption and avoid introducing unnecessary complexity into the system.