MODSOURMAN-857: SPIKE: investigate deleting old versions of records from SRS

MODSOURMAN-857: SPIKE: investigate deleting old versions of records from SRS

Ticket: MODSOURMAN-857
Status: In Progress

Problem Statement

When SRS (Source Record Storage) records are updated, the prior version of the record is marked as "OLD," and the latest version is marked as "ACTUAL." Over time, these accumulated OLD versions may cause storage bloat and potential performance degradation. There is currently no automatic mechanism for cleaning up these outdated records.


Proposed Solutions

  1. Provide On-Demand Cleanup Script (DONE)

    • A script already exists to manually clean up OLD records from the database (https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/686719173)

      • Advantages:

        • Quick and easy to implement.

        • Provides an immediate performance boost.

      • Disadvantages:

        • Requires manual execution on a recurring basis.

        • Does not prevent OLD records from continuously piling up over time during regular module usage.


  1. Implement Background Job for Periodic Cleanup

    • Extend the MarcIndexersVersionDeletionVerticle, which is currently responsible for periodically deleting MARC Indexers, to clean "OLD" records.

    • The MarcIndexersVersionDeletionVerticle allows the clean-up interval to be configured using the srs.marcIndexers.delete.interval.seconds system property.

    • Update the RecordDao.deleteRecords function (not currently in use) to support deletion of OLD versions of records.

    • Considerations:

      • After implementing this functionality, the script from Solution #1 will need to be run once immediately after upgrading to the new version. This ensures that the periodic background job doesn’t take excessive time during its initial execution.

    • Advantages:

      • Automated maintenance with no manual intervention required post-implementation.

      • Keeps storage usage and performance consistent.

    • Disadvantages:

      • Requires additional development and testing time.


  1. Remove the Versioning Mechanism (New Feature Required)

    • Eliminate the entire mechanism for maintaining OLD and ACTUAL versions of SRS records.

    • Replace the current update mechanism with a new methodology involving record overlaying (directly updating existing records rather than creating versions).

    • Key Considerations:

      • Revise SRS Record Entities:

        • Consolidate all information so that only one record version is stored per entry.

        • Revise Record and SourceRecord entities

        • Evaluate the necessity of removing associated entities such as RawRecord, ErrorRecord, and Snapshot.

          • Information stored in RawRecord and ErrorRecord is already handled in SRM (Source Record Manager), making their continued existence redundant.

          • The Snapshot entity mirrors information about JobExecution that can also be retrieved from SRM, suggesting it may no longer be required.

      • Remove the Generation Calculation Mechanism:

        • Stop processing and storing historical versions of records when making updates.

        • Update records like overlays while retaining the original record ID.

      • Match Record ID with 999 ff $s:

        • Ensure that the matched ID (subfield 999 ff $s) aligns with the actual record ID in the database.

      • Migration of Existing Data:

        • Perform incremental migration steps to:

          • Delete OLD versions.

          • Potentially remove Snapshot, ErrorRecord, and RawRecord entities.

          • Migrate data to a different structure of Record entity (id values should match matched_id and 999 ff s).

    • Advantages:

      • Removes complexity related to versioning and cleanup, eliminating the need for periodic maintenance scripts or background jobs.

      • Aligns SRS functionality with a more unified approach across FOLIO modules.

      • Significant long-term storage savings and reduced code complexity.

    • Disadvantages:

      • Introduces a breaking change, requiring extensive updates and testing of all related modules (e.g., SRM, inventory modules, etc.).

      • Requires significant development effort, including data migration and refactoring.


Evaluation of Approaches

Criteria

On-Demand Script

Background Job for Conversion

Remove Versioning Mechanism

Criteria

On-Demand Script

Background Job for Conversion

Remove Versioning Mechanism

Ease of Implementation

High

Medium

Low

Impact on Performance

Medium (temporary boost, manual)

High (consistent, periodic cleanup)

High (permanent performance increase)

Change Scope

Minimal

Moderate

Significant (breaking change)

Maintenance Effort

High

Low

None (beyond initial development)

Alignment with Long-Term Goals

Low

Medium

High


Recommendations

  1. Short-Term Solution:

    • Implement Solution #2 (Background Job for Periodic Cleanup) to address performance concerns in a sustainable manner while avoiding the need for manual intervention.

    • Extend the existing MarcIndexersVersionDeletionVerticle to clean up OLD records automatically at configurable intervals.

  2. Long-Term Vision:

    • Invest in Solution #3 (Remove Versioning Mechanism) to align with the broader goals of unifying and simplifying FOLIO modules.

    • Carefully plan the removal of redundant entities (e.g., Snapshot, RawRecord, ErrorRecord) and migration steps to ensure a smooth transition and minimal disruption to users.


Next Steps

  • For Solution #2:

    1. Define required updates to MarcIndexersVersionDeletionVerticle to support the cleanup of OLD records.

    2. Update RecordDao.deleteRecords to delete OLD records.

    3. Conduct testing to ensure the background job operates as expected and does not interfere with system performance.

  • For Solution #3:

    1. Define a comprehensive migration plan, including the removal of OLD record versions and associated redundant entities (RawRecord, ErrorRecord, Snapshot).

    2. Coordinate with other teams to address the impacts of breaking changes.

    3. Develop and thoroughly test the overlay update mechanism and conduct full-scale validation after migration.

By addressing the immediate need (Solution #2) and aligning with long-term goals (Solution #3), we can ensure efficient, sustainable, and simplified record management in SRS while preserving system performance.