Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. ElasticSearch should be used to store data because of:
         - Audit logs may contain 100s of millions of records
         - Audit data is immutable. Audit records shouldn't be updated. The records should be deleted/archived in batches once they expire.
         - A full-text search is required to be used to find Audit records.
         - It is required to return the total number of records found for a search query. The number should be included in response to the original query. Query response time should be less than a few seconds. 

  2. mod-audit-log-interceptor should be implemented as a separate module and be deployed as a side-car with modules that contain Audit data. It should work as a proxy that intercepts these modules requests/responses. This approach allows to add the interceptor without modifying target modules.
    However, it should be also possible to use the Audit logs interceptor library as an alternative. The library should be built into the Vertex request-response pipeline. As a result, there will be different tools to collect Audit data.

  3. mod-audit-log-interceptor should only capture Audit data and transfer it as-is via Kafka to mod-circulation-audit-logs-aggregator. mod-circulation-audit-logs-aggregator in its turn should convert source data into the target schema and format and post it to ElasticSearch. As a result, the interceptor should spend little time for data capturing.
  4. mod-audit-log-interceptor and mod-circulation-audit-logs-aggregator should leverage the new Kafka messaging library developed by Taras Spashchenko (a wrapper around native Kafka client library). It should provide better scalability and message delivery guarantees. 
  5. mod-circulation-audit-logs-provider, mod-circulation-audit-logs-aggregator modules should be implemented by using standard set of Folio module technologies:
    - Java
    - RMB
    - Docker
  6. ui-audit-logs module should leverage Stripes

However, Target architecture couldn't be fully implemented immediately, because:

  • The new Kafka messaging library hasn't been fully added to the Folio yet.
  • ElasticSearch PoC hasn't been finished yet.
  • There is a time limitation to implement the first version of Circulation Audit logs (should be released by the end of Q3 - beginning of Q4).

As a result, the first version of Circulation Audit logs should have Transient architecture (see description below). 

Transient architecture

Drawio
bordertrue
diagramNameCirculation Audit logs (Transient) - High level solution structure
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth841861
revision13


The following table shows how entities from Conceptual diagram (see Conceptual model page) are mapped to the solution components:

Conceptual diagram entitySolution component
Audit Logs interceptorAudit logs interceptor library
Audit Log storagemod-pubsub, Kafka
Audit Logs aggregatormod-circulation-audit-logs-aggregator
Materialized viewPostgreSQL
Audit Reports providermod-circulation-audit-logs-provider
Folio UIFolio UI

Key decisions

  1. PostgreSQL should be used to store Audit records instead of ElasticSearch. It is already used by other Folio modules.
    It will be later replaced with ElasticSearch.
  2. Audit logs interceptor library should be used to capture Audit data and send it to mod-circulation-audit-logs-aggregator via Kafka. The library should be built into the Vertex request-response pipeline and intercept requests/responses containing Audit data.
  3. mod-pubsub should be used by mod-circulation-audit-logs-aggregator module and Audit logs interceptor library to interact with Kafka (until the new Kafka messaging library is fully added to the Folio).
  4. mod-circulation-audit-logs-provider, mod-circulation-audit-logs-aggregator modules should be implemented by using standard set of Folio module technologies:
    - Java
    - RMB
    - Docker
  5. ui-audit-logs module should leverage Stripes