Inventory Audit log

Summary

The technical design is aimed to address the need of catalogers to track the history of changes for different entities in FOLIO. The solution uses the common approach for auditable events in FOLIO similar to the audit of events in circulation and acquisition domains.

Requirements

Functional Requirements

https://folio-org.atlassian.net/browse/UXPROD-4125

https://folio-org.atlassian.net/browse/UXPROD-4126

NFR

UXPROD-4125/UXPROD-4126 NFR Scorecard

Assumptions

  • For ECS environments: Shared entities' version history should be tracked only in the central tenant.

  • All changes in the system related to inventory entities (instances, items, holdings, bibs) generate Domain events.

  • Domain events related to update action have old and new versions of the entity

Baseline Architecture

In existing architecture, mod-inventory-storage is responsible for persisting such entities as instances, holdings, and items. mod-entities-links is responsible for authorities. Both modules produce domain events on create/update/delete actions from different sources.

 

Target Architecture

The existing architecture allows the reuse of the capabilities of the domain events approach to persist audit log events.

Audit Consumers Sequence Diagram

Audit Consumers with Outbox Sequence Diagram

The implementation can follow a Transactional outbox pattern. The approach allows enhanced guarantee for persisting the audit event but the trade-off is that this approach will negatively affect the performance of flows related to domain events.

 

Solution Summary

The process is split into two main parts

1. Persistence: There are storage options that can be implemented.

Option

Description

Pros & Cons

Option

Description

Pros & Cons

1

RDBMS

The audit database should persist a diff of the entity. The queries are made mostly by the entity's unique identifier. Thus partitioning by UUID and subpartitioning by date ranges can be applied to audit tables

Pros:

  • allows flexible access to versioning data

Cons:

  • limited scaling options

  • negative impact on Postgres that is used by all others modules

2

Object Storage

AWS S3-like storage can be used to persist diffs because audit events can be stored as plain-text (JSON) documents

Pros:

  • allows scaling almost indefinitely

Cons:

  • requires an additional solution for complex queries

  • might cause high costs for storage bc of a large amount of operations on small files

2. Version history display: This should be done on demand comparing each diff of the entity to the previous

3. Changes to be tracked:

  • Changes in entity fields

  • Changes related to an update of the parent entity, e.g. holding update cascades to updates in items’ effective shelving order

  • Changes in metadata should be ignored

Key implementation aspects:

  1. The Kafka default delivery semantics is “AT_LEAST_ONCE”. Ensure that domain events have their unique identifiers to be able to handle consuming messages in an idempotent manner

  2. Add new consumers in mod-audit to inventory domain events for instances, items, and holdings.

  3. Add new consumers in mod-audit to authority domain events for authorities.

  4. Add new consumers in mod-audit to source record domain events for marc-bib records.(see: Source Record Domain Eventing)

  5. mod-audit should support the following configurations on the tenant level:

    1. Retention period in years (with default value - 0 for indefinite retention)

    2. Feature flag to enable audit capability. In case when the audit is disabled no consumers and logs should be persisted.

    3. Anonymizing flag that indicates whether the records in the database should be anonymized before persistence to the database (To be confirmed).

  6. mod-audit should have the following scheduled jobs:

    1. Daily: to remove records that exceed the retention period

    2. Monthly: to create subpartitions for audit tables

  7. Persist audit events in an event storage. A table in DB per entity type with partitioning by UUID (hash) and subpartitioning by date range.

  8. Create REST API
    4. to provide information on a list of changes related to a particular entity
    5. to provide detailed information on the particular change - this API should use the Object diff library to return a verbose description of the changes related to the entity

ERD

With data size implications, creating separate tables per each entity type is required. The default table structure is listed below:

Column

Type

required

unique

Description

Column

Type

required

unique

Description

1

EventID

UUID

y

y

unique event identifier

2

EventDate

timestamp

y

n

date when the event appeared in the event log

3

Origin

varchar

y

n

Origin of the event: data-import, batch-update, user, etc.

4

Action

varchar

y

n

what action was performed

5

ActionDate

timestamp

y

n

when action was performed

6

EntityID

UUID

y

n

entity identifier

7

UserId

UUID

y

n

user who did the action, fixed UUID for anonymized user

8

Diff

jsonb

y

n

Difference between “old” and “new” body of the entity

WBS

Title

Description

Module

Link

Title

Description

Module

Link

1

[Instance (FOLIO) Audit] Extend domain event with source FOLIO

Create/Update/Delete operations in the module should generate related events with the new/old body for the entity
Acceptance Criteria:

  1. The domain event has new/old fields

  2. The domain has a unique identifier for the message

mod-invenotry-storage

https://folio-org.atlassian.net/browse/UXPROD-4125

2

[Item Audit] Extend domain event with source FOLIO

Create/Update/Delete operations in the module should generate related events with the new/old body for the entity
Acceptance Criteria:

  1. The domain event has new/old fields

  2. The domain has a unique identifier for the message

mod-invenotry-storage

https://folio-org.atlassian.net/browse/UXPROD-4125

3

[Holding Audit] Extend domain event with source FOLIO

Create/Update/Delete operations in the module should generate related events with the new/old body for the entity
Acceptance Criteria:

  1. The domain event has new/old fields

  2. The domain has a unique identifier for the message

mod-invenotry-storage

https://folio-org.atlassian.net/browse/UXPROD-4125

4

[Instance (MARC) Audit] Extend domain event with source MARC

Create/Update/Delete operations in the module should generate related events with the new/old body for the entity
Acceptance Criteria:

  1. The domain event has new/old fields

  2. The domain has a unique identifier for the message

mod-source-record-storage

https://folio-org.atlassian.net/browse/UXPROD-4125

5

[Authority Audit] Extend domain event for Authority

Create/Update/Delete operations in the module should generate related events with the new/old body for the entity
Acceptance Criteria:

  1. The domain event has new/old fields

  2. The domain has a unique identifier for the message

mod-entities-links

https://folio-org.atlassian.net/browse/UXPROD-4126

6

[Instance(FOLIO) Audit] Consume domain event

To allow efficient access to the data in the audit table it is required to:

  1. Create a table with partitioning by UUID, date ranges (a quarter of a year) (Inventory Audit log | ERD )

  2. Create a scheduled nightly job that creates additional partitions when necessary

  3. Create Kafka consumer for domain event

  4. Persist entity diff


Acceptance Criteria:

  1. Entity diff object should contain "added", "removed", and "modified" lists with old-new values for any affected field

  2. Changes in metadata should be ignored

  3. Audit records related to the "create" event should contain only "added" sub-object without previous values

  4. Audit records related to "delete" events should contain no body

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

7

[Instance(MARC) Audit] Consume domain event

To allow efficient access to the data in the audit table it is required to:

  1. Create a table with partitioning by UUID, date ranges (a quarter of a year) (Inventory Audit log | ERD )

  2. Create a scheduled nightly job that creates additional partitions when necessary

  3. Create Kafka consumer for domain event

  4. Persist entity diff


Acceptance Criteria:

  1. Entity diff object should contain "added", "removed", and "modified" lists with old-new values for any affected field

  2. Changes in metadata should be ignored

  3. Audit records related to the "create" event should contain only "added" sub-objects without previous values

  4. Audit records related to "delete" events should contain no body

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

8

[Holding Audit] Consume domain event

To allow efficient access to the data in the audit table it is required to:

  1. Create a table with partitioning by UUID, date ranges (a quarter of a year) (Inventory Audit log | ERD )

  2. Create a scheduled nightly job that creates additional partitions when necessary

  3. Create Kafka consumer for domain event

  4. Persist entity diff


Acceptance Criteria:

  1. Entity diff object should contain "added", "removed", and "modified" lists with old-new values for any affected field

  2. Changes in metadata should be ignored

  3. Audit records related to the "create" event should contain only "added" sub-object without previous values

  4. Audit records related to "delete" events should contain no body

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

9

[Item Audit] Consume domain event

To allow efficient access to the data in the audit table it is required to:

  1. Create a table with partitioning by UUID, date ranges (a quarter of a year) (Inventory Audit log | ERD )

  2. Create a scheduled nightly job that creates additional partitions when necessary

  3. Create Kafka consumer for domain event

  4. Persist entity diff


Acceptance Criteria:

  1. Entity diff object should contain "added", "removed", and "modified" lists with old-new values for any affected field

  2. Changes in metadata should be ignored

  3. Audit records related to the "create" event should contain only "added" sub-objects without previous values

  4. Audit records related to "delete" events should contain no body

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

10

[Authority Audit] Consume domain event

To allow efficient access to the data in the audit table it is required to:

  1. Create a table with partitioning by UUID, date ranges (a quarter of a year) (Inventory Audit log | ERD )

  2. Create a scheduled nightly job that creates additional partitions when necessary

  3. Create Kafka consumer for domain event

  4. Persist entity diff


Acceptance Criteria:

  1. Entity diff object should contain "added", "removed", and "modified" lists with old-new values for any affected field

  2. Changes in metadata should be ignored

  3. Audit records related to the "create" event should contain only "added" sub-objects without previous values

  4. Audit records related to "delete" events should contain no body

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4126

11

[Instance/Item/Holding/Bib Audit] Configuration

Provide configuration parameter to enable/disable audit log on tenant level
Acceptance Criteria:

  1. The local configuration table is present

  2. API for enabling/disabling the feature is available with the correct permissions/capabilities

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

12

[Instance/Item/Holding/Bib Audit] Configuration

Anonymize events
Acceptance Criteria:

  1. The local configuration table is present

  2. API for enabling/disabling the feature is available with the correct permissions/capabilities

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

13

[Instance/Item/Holding/Bib Audit] Configuration

Provide configuration parameter to set the retention period for audit records on tenant level
Acceptance Criteria:

  1. Retention period in years (with default value - 0 for indefinite retention)

  2. The local configuration table is present

  3. API for enabling/disabling the feature is available with the correct permissions/capabilities

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

14

[Instance/Item/Holding/Bib Audit] Configuration

Create a scheduled nightly job to clean up the records that exceed the retention period
Acceptance Criteria:

  1. The job execution should be skipped if the configuration for the retention period is 0 (indefinite period)

  2. If the related subpartition of the table is empty, then the subpartition should also be dropped

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

15

[Instance(FOLIO) Audit] Rest Endpoint for history

Query list of diffs per entity from the database with pagination by quarter of the year by entity identifier. Return list of diff records

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

16

[Instance(MARC) Audit] Rest Endpoint for history

Query list of diffs per entity from the database with pagination by quarter of the year by entity identifier. Return list of diff records

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

17

[Holding Audit] Rest Endpoint for history

Query list of diffs per entity from the database with pagination by quarter of the year by entity identifier. Return list of diff records

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

18

[Item Audit] Rest Endpoint for history

Query list of diffs per entity from the database with pagination by quarter of the year by entity identifier. Return list of diff records

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4125

19

[Authority Audit] Rest Endpoint for history

Query list of diffs per entity from the database with pagination by quarter of the year by entity identifier. Return list of diff records

mod-audit

https://folio-org.atlassian.net/browse/UXPROD-4126

20

[Instance/Item/Holding/Bib Audit] Show history Pane in inventory

Call related API to provide the most recent updates. The older records should be fetched only if a user clicks “Show more” button

ui-inventory

https://folio-org.atlassian.net/browse/UXPROD-4125

Risks and concerns

Risk

Description

Probability

Impact

Mitigation

Risk

Description

Probability

Impact

Mitigation

1

Long period for audit records retention

The number of records could overwhelm the capability of the Postgres database both from a computational point of view and cost

High

High

Introduce separate storage for audit-events

2

Cascade Updates will create redundant copies in the audit log

The update to holdings causes updates to all related items. Some holdings may contain ~15000 records

High

Medium

Collapse or filter out events that only change the parent entity

3

Some flows could update inventory entities without using the Domain-events mechanism

With different capabilities of the system including UI, data import, bulk edit, etc some of the flows might skip sending Domain events and/or edit entities directly

Low

Medium

List those cases and add domain events to flows that has no this capability

4

Linked data

The flow and integration with inventory are not clear for the BIBFRAME format

Low

Low

Adjust BIBFRAME flow to follow the proposed solution for other inventory entities

Product Questions

 

Question

Answer

Comment

Question

Answer

Comment

1

Should failure in sending audit message block the create/update/delete operation?

Hey @Kalibek Turgumbayev - what happens today when an update is made and the create/update date and time stamp is not updated?

The question is related to transactional outbox pattern implementation

2

What would be the period of retention for Audit records?

@Dennis Bridges has this requirement come up for you with respect to Acq’s change tracker?

The storage options depend on this question:

  • Postgres for ~ 1-3year

  • Postrgres with Partitioning by UUID 5-15 years

  • NoSQL options for 20+ years

3

In what form should we show the changes to non-marc fields (e.g. staffSuppress, administrative Notes, etc.) in MARC instances?

@Kalibek Turgumbayev - I am unsure I understand this question. Can you review this mockup of how to display updates made to a FOLIO instance record?

Instance with source FOLIO or MARC from inventory is a separate object than SRS record and should be tracked separately

4

If only the order of fields in a MARC record is changed, should it be logged?

@Kalibek Turgumbayev - Good question - I need to ask users but unless it is a significant to implement, answer is Yes.

@Dennis Bridges has this requirement come up for you with respect to Acq’s change tracker?

 

5

Do we have scenarios where the audit log is exported in batches for some period of dates?

It is possible that a library may want to do so but I do not think it is a requirement for Sunflower. @Dennis Bridges has this requirement come up for you with respect to Acq’s change tracker?

If the solution uses Postgres with partitioning by entity key, such exports would cause significant performance issues.

6

Do we need to have remapping or other technical updates as auditable events?

Not for Sunflower

We might need a list of actions leading to an event in the audit log.

7

Can we show only the last 20 records (last 3 months)?

Yes. Older records can be fetched with “see more” button and alert that it can take time.

Display of the whole history of the entity will impact performance negatively.

8

What should be tracked with BIBFRAME? Should we track the original BIBFRAME record or related MARC records is enough.

For Sunflower only track related MARC records.

 

Links

  1. Acquisition event log - data retention period is 20 years

  2. Transactional outbox pattern

  3. Orders Event Log

  4. Javers - Java object diff library