MODSOURMAN-857: SPIKE: investigate deleting old versions of records from SRS
Ticket: MODSOURMAN-857
Status: In Progress
Problem Statement
When SRS (Source Record Storage) records are updated, the prior version of the record is marked as "OLD," and the latest version is marked as "ACTUAL." Over time, these accumulated OLD versions may cause storage bloat and potential performance degradation. There is currently no automatic mechanism for cleaning up these outdated records.
Proposed Solutions
Provide On-Demand Cleanup Script (DONE)
A script already exists to manually clean up OLD records from the database (https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/686719173)
Advantages:
Quick and easy to implement.
Provides an immediate performance boost.
Disadvantages:
Requires manual execution on a recurring basis.
Does not prevent OLD records from continuously piling up over time during regular module usage.
Implement Background Job for Periodic Cleanup
Extend the MarcIndexersVersionDeletionVerticle, which is currently responsible for periodically deleting MARC Indexers, to clean "OLD" records.
The
MarcIndexersVersionDeletionVerticleallows the clean-up interval to be configured using thesrs.marcIndexers.delete.interval.secondssystem property.Update the
RecordDao.deleteRecordsfunction (not currently in use) to support deletion of OLD versions of records.Considerations:
After implementing this functionality, the script from Solution #1 will need to be run once immediately after upgrading to the new version. This ensures that the periodic background job doesn’t take excessive time during its initial execution.
Advantages:
Automated maintenance with no manual intervention required post-implementation.
Keeps storage usage and performance consistent.
Disadvantages:
Requires additional development and testing time.
Remove the Versioning Mechanism (New Feature Required)
Eliminate the entire mechanism for maintaining OLD and ACTUAL versions of SRS records.
Replace the current update mechanism with a new methodology involving record overlaying (directly updating existing records rather than creating versions).
Key Considerations:
Revise SRS Record Entities:
Consolidate all information so that only one record version is stored per entry.
Revise Record and SourceRecord entities
Evaluate the necessity of removing associated entities such as
RawRecord,ErrorRecord, andSnapshot.Information stored in
RawRecordandErrorRecordis already handled in SRM (Source Record Manager), making their continued existence redundant.The
Snapshotentity mirrors information aboutJobExecutionthat can also be retrieved from SRM, suggesting it may no longer be required.
Remove the Generation Calculation Mechanism:
Stop processing and storing historical versions of records when making updates.
Update records like overlays while retaining the original record ID.
Match Record ID with
999 ff $s:Ensure that the matched ID (subfield 999 ff $s) aligns with the actual record ID in the database.
Migration of Existing Data:
Perform incremental migration steps to:
Delete OLD versions.
Potentially remove
Snapshot,ErrorRecord, andRawRecordentities.Migrate data to a different structure of Record entity (id values should match matched_id and 999 ff s).
Advantages:
Removes complexity related to versioning and cleanup, eliminating the need for periodic maintenance scripts or background jobs.
Aligns SRS functionality with a more unified approach across FOLIO modules.
Significant long-term storage savings and reduced code complexity.
Disadvantages:
Introduces a breaking change, requiring extensive updates and testing of all related modules (e.g., SRM, inventory modules, etc.).
Requires significant development effort, including data migration and refactoring.
Evaluation of Approaches
Criteria | On-Demand Script | Background Job for Conversion | Remove Versioning Mechanism |
|---|---|---|---|
Ease of Implementation | High | Medium | Low |
Impact on Performance | Medium (temporary boost, manual) | High (consistent, periodic cleanup) | High (permanent performance increase) |
Change Scope | Minimal | Moderate | Significant (breaking change) |
Maintenance Effort | High | Low | None (beyond initial development) |
Alignment with Long-Term Goals | Low | Medium | High |
Recommendations
Short-Term Solution:
Implement Solution #2 (Background Job for Periodic Cleanup) to address performance concerns in a sustainable manner while avoiding the need for manual intervention.
Extend the existing MarcIndexersVersionDeletionVerticle to clean up OLD records automatically at configurable intervals.
Long-Term Vision:
Invest in Solution #3 (Remove Versioning Mechanism) to align with the broader goals of unifying and simplifying FOLIO modules.
Carefully plan the removal of redundant entities (e.g.,
Snapshot,RawRecord,ErrorRecord) and migration steps to ensure a smooth transition and minimal disruption to users.
Next Steps
For Solution #2:
Define required updates to
MarcIndexersVersionDeletionVerticleto support the cleanup of OLD records.Update
RecordDao.deleteRecordsto delete OLD records.Conduct testing to ensure the background job operates as expected and does not interfere with system performance.
For Solution #3:
Define a comprehensive migration plan, including the removal of OLD record versions and associated redundant entities (
RawRecord,ErrorRecord,Snapshot).Coordinate with other teams to address the impacts of breaking changes.
Develop and thoroughly test the overlay update mechanism and conduct full-scale validation after migration.
By addressing the immediate need (Solution #2) and aligning with long-term goals (Solution #3), we can ensure efficient, sustainable, and simplified record management in SRS while preserving system performance.