Executive Summary
LoC requires an authority deletion feature. The implementation should not have a negative impact on active records performance.
Requirements
Functional Requirements
- Allow the library to export deleted MARC authority records via API.
- Support providing deleted authority record UUIDs via API
- Support authority record purge configuration to be set to Never per tenant
Non-functional Requirements
Performance:
- 10k weekly
- 50k monthly
- Very rarely more than 100k (KG will confirm with LOC)
Data consistency:
- There should be no invalid links to deleted authorities in instances
- Authority statistics should be aligned after the authority record is deleted.
Data retention period:
- Support authority record purge configuration to be set to Never per tenant
Assumptions
- Authority record restoration is not planned in future releases
Baseline Architecture
The current implementation of authority record deletion consists of the following steps:
- "Soft-delete" Flag is updated to true
- Domain event regarding deleted records is sent through Kafka
- After events are sent and processed, records are physically deleted from the authority table
Target Architecture
The target solution for authority record deletion requires the support of export of deleted records, hence they should be persisted in the database and physically deleted only after the retention period elapses. This requires support of "soft" delete and "hard" delete operations. Soft-delete should be implemented as moving deleted records into the archive table in order to provide information on deleted records. The archive table also will provide minimal impact on the performance of active records.
The approach for the archive table:
- Same tablespace as the active table. Table name pattern
<original_table_name>_archive
- Trigger on delete should be implemented to move the record to the archive table.
- Domain events should be distinct for soft- and hard-delete.
- Update
lastUpdatedDate, updatedByUserId
in metadata. - Tenant-level retention policy should be provided through configuration parameters. Tenant-level configuration of the retention period for the deleted records should be kept in mod-settings.
- The projected transaction volume might cause big amount of records. To address that the API for listing of deleted records should support export in plain text format and json. Besides that it should provide filtering by date and pagination, to be able to fetch list of deleted records in chunks.
Process:
- Soft-delete:
- update the soft-delete flag in the original table, send events
- after events are processed, move the deleted record to the archive table
- Hard-delete:
- if the retention policy is set then, on schedule, the archive table should be cleaned by
lastUpdatedDate
field. A reasonable schedule frequency is once a day. The retention period for deleted records should be fetched frommod-settings.
Then each job run should remove all deleted records, that are older than the retention period. - a "delete" event should be sent to
mod-source-record-storage
to remove related source records. To enable consuming of "delete" events it is required to add an operation to the kafka message header and filter out other operations - if the retention policy for the tenant is set to
never
then no cleanup is required
- if the retention policy is set then, on schedule, the archive table should be cleaned by
Open Questions
# | Question | Answer |
---|---|---|
1 | Do we have to implement an "undo" action for delete(restore record)? | Khalilah Gambrell : NO |
2 | Are performance requirements numbers for peak moments or is every week expected to have such a load? |