Summary
The technical design is aimed to address the need of catalogers to track the history of changes for different entities in FOLIO. The solution uses the common approach for auditable events in FOLIO similar to the audit of events in circulation and acquisition domains.
Requirements
Functional Requirements
- UXPROD-4125Getting issue details... STATUS
- UXPROD-4126Getting issue details... STATUS
NFR
UXPROD-4125/UXPROD-4125 NFR Scorecard
Assumptions
For ECS environments: Shared entities' version history should be tracked only in the central tenant.
All changes in the system related to inventory entities (instances, items, holdings, bibs) generate Domain events.
Baseline Architecture
In existing architecture, mod-inventory-storage
is responsible for persisting such entities as instances, holdings, and items. mod-entities-links
is responsible for authorities. Both modules produce domain events on create/update/delete actions from different sources.
Target Architecture
The existing architecture allows the reuse of the capabilities of the domain events approach to persist audit log events.
Audit Consumers Sequence Diagram
Audit Consumers with Outbox Sequence Diagram
The implementation can follow a Transactional outbox pattern. The approach allows enhanced guarantee for persisting the audit event but the trade-off is that this approach will negatively affect the performance of flows related to domain events.
Solution Summary
Add a new consumer to inventory domain events
Persist audit events in separate event storage. A single table in DB per entity type
Create REST API
4. to provide information on a list of changes related to a particular entity
5. to provide detailed information on the particular change - this API should use the Object diff library to return a verbose description of the difference between current and previous snapshots of the entity
ERD
With data size implications, it might be reasonable to create separate tables per each entity type
Column | Type | required | unique | Description | |
---|---|---|---|---|---|
1 | EventID | UUID | y | y | unique event identifier |
2 | EventDate | timestamp | y | n | date when the event appeared in the event log |
3 | Source | varchar | y | n | source of the event: data-import, batch-update, user, etc. |
4 | Action | varchar | y | n | what action was performed |
5 | ActionDate | timestamp | y | n | when action was performed |
6 | EntityType | varchar | y | n | what entity: instance/authority/etc |
7 | EntityID | UUID | y | n | entity identifier |
8 | UserId | UUID | y | n | user who did the action, fixed UUID for anonymized user |
9 | Snapshot | jsonb | y | n | body of the entity |
Risks and concerns
Risk | Description | Probability | Impact | Mitigation | |
---|---|---|---|---|---|
1 | Long period for audit records retention | The number of records could overwhelm the capability of the Postgres database both from a computational point of view and cost | High | High | Introduce separate storage for audit-events |
2 | Cascade Updates will create redundant copies in the audit log | The update to holdings causes updates to all related items. Some holdings may contain ~15000 records | High | Medium | Collapse or filter out events that only change parent entity |
3 | Some flows could update inventory entities without using the Domain-events mechanism | With different capabilities of the system including UI, data import, bulk edit, etc some of the flows might skip sending Domain events and/or edit entities directly | Low | Medium | List those cases and add domain events to flows that has no this capability |
4 | Linked data | The flow and integration with inventory are not clear for the BIBFRAME format | Low | Low | Adjust BIBFRAME flow to follow the proposed solution for other inventory entities |
Questions
[Arch] Acquisition log solution uses a Transactional Outbox pattern to ensure atomicity of
business and audit operation. This is done because this log contains data related to financial
operations. Do we need to use the Transactional Outbox pattern for the Inventory case?We need an outbox pattern if we REALLY need to track those changes
[Arch] Do we use mod-audit or just source module? (acquisition log uses mod-audit module but with specific tables)
Use mod-audit with audit separate table for inventory events
Retention period: is not yet addressed
[Product] There is a library for the comparison of Java objects that outputs the difference between two objects in the following form:
* changes on com.epam.instancehistory.model.instance.Instance/ : - 'notes' collection changes : 1. 'com.epam.instancehistory.model.instance.Instance/#notes/1' added - 'notes/1.note' = 'Test note' - 'notes/1.staffOnly' = 'false'
Is that okay, or do we need a more verbose description of the changes?
[Product] Do we need to ignore history for "shared" entities?
Links
Acquisition event log - data retention period is 20 years
Javers - Java object diff library