Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Audit Consumers Sequence Diagram

Drawio
custContentId
mVer2
simple0
zoom1
simple0
inComment0
337084458pageId332955650
lboxcustContentId1337084458
diagramDisplayNameSpitfire-Inventory-Audit-target-sequence.drawio
lbox1
contentVer6
revision6
baseUrlhttps://folio-org.atlassian.net/wiki
diagramNameSpitfire-Inventory-Audit-target-sequence.drawio
pCenter0
width1336
links
tbstyle
height611

...

The implementation can follow a Transactional outbox pattern. The approach allows enhanced guarantee for persisting the audit event but the trade-off is that this approach will negatively affect the performance of flows related to domain events.

Solution Summary

The process is split into two main parts

  1. Persistence: The audit database should persist a snapshot of the entity. The queries are made mostly by the entity's unique identification. Thus partitioning by UUID can be applied to audit tables

  2. Version history display: This should be done on demand comparing each consecutive snapshot of the entity to the previous

Key implementation aspects:

  1. Add a new consumer The Kafka default delivery semantics is “AT_LEAST_ONCE”. Ensure that domain events have their unique identifiers to be able to handle consuming messages in an idempotent manner

  2. Add new consumers in mod-audit to inventory domain events.

  3. Persist audit events in separate an event storage. A single table in DB per entity type with partitioning by UUID.

  4. Create REST API
    4. to provide information on a list of changes related to a particular entity
    5. to provide detailed information on the particular change - this API should use the Object diff library to return a verbose description of the difference between current and previous snapshots of the entity

...

With data size implications, it might be reasonable to create creating separate tables per each entity type is required. The default table structure is listed below:

Column

Type

required

unique

Description

1

EventID

UUID

y

y

unique event identifier

2

EventDate

timestamp

y

n

date when the event appeared in the event log

3

Source

varchar

y

n

source of the event: data-import, batch-update, user, etc.

4

Action

varchar

y

n

what action was performed

5

ActionDate

timestamp

y

n

when action was performed

6

EntityType

varchar

y

n

what entity: instance/authority/etc

7

EntityID

UUID

y

n

entity identifier

87

UserId

UUID

y

n

user who did the action, fixed UUID for anonymized user

98

Snapshot

jsonb

y

n

body of the entity

...

Risk

Description

Probability

Impact

Mitigation

1

Long period for audit records retention

The number of records could overwhelm the capability of the Postgres database both from a computational point of view and cost

High

High

Introduce separate storage for audit-events

2

Cascade Updates will create redundant copies in the audit log

The update to holdings causes updates to all related items. Some holdings may contain ~15000 records

High

Medium

Collapse or filter out events that only change parent entity

3

Some flows could update inventory entities without using the Domain-events mechanism

With different capabilities of the system including UI, data import, bulk edit, etc some of the flows might skip sending Domain events and/or edit entities directly

Low

Medium

List those cases and add domain events to flows that has no this capability

4

Linked data

The flow and integration with inventory are not clear for the BIBFRAME format

Low

Low

Adjust BIBFRAME flow to follow the proposed solution for other inventory entities

Questions

  1. [Arch] Acquisition log solution uses a Transactional Outbox pattern to ensure atomicity of
    business and audit operation. This is done because this log contains data related to financial
    operations. Do we need to use the Transactional Outbox pattern for the Inventory case?

    • We need an outbox pattern if we REALLY need to track those changes

  2. [Arch] Do we use mod-audit or just source module? (acquisition log uses mod-audit module but with specific tables)

    • Use mod-audit with audit separate table for inventory events

    • Retention period: is not yet addressed

  3. [Product] There is a library for the comparison of Java objects that outputs the difference between two objects in the following form:

Code Block
* changes on com.epam.instancehistory.model.instance.Instance/ :
  - 'notes' collection changes :
     1. 'com.epam.instancehistory.model.instance.Instance/#notes/1' added
  - 'notes/1.note' = 'Test note'
  - 'notes/1.staffOnly' = 'false'

Is that okay, or do we need a more verbose description of the changes?

...

Question

Answer

Comment

1

Should failure in sending audit message block the create/update/delete operation?

The question is related to transactional outbox pattern implementation

2

What would be the period of retention for Audit records?

The storage options depend on this question:

  • Postgres for ~ 1-3year

  • Postrgres with Partitioning by UUID 5-15 years

  • NoSQL options for 20+ years

3

In what form should we show the changes to non-marc fields (e.g. staffSuppress, administrative Notes, etc.) in MARC instances?

4

If only the order of fields in a MARC record is changed, should it be logged?

5

Do we have scenarios where the audit log is exported in batches for some period of dates?

If the solution uses Postgres with partitioning by entity key, such exports would cause significant performance issues.

  1. Acquisition event log - data retention period is 20 years

  2. Transactional outbox pattern

  3. Orders Event Log

  4. Javers - Java object diff library