Skip to end of banner
Go to start of banner

Apache Kafka Messaging System

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Kafka Topic Naming Convention

ENV and tenant id setting should be used in topic naming convention. This will separate data of different customers to different Kafka topics. In addition to that it allows a Kafka instance to be shared by multiple environments that have the same tenant id. 

Topic name should be concatenated from the following string constants (in exactly the same order):

  1. Environment name (from ENV environment variable)
  2. Tenant id (should be the second, because it is convenient to use wildcard in ACL for Kafka users)
  3. Producer module name ("storage" postfix should be omitted)
    1. (question) RA: I'd propose to explicitly specify full module name here to avoid any ambiguation, e.g. bugfest.fs09000000.mod-inventory-storage.instance (mod- as a mandatory part of any module name can be potentially skipped if all would agree on that)
  4. Domain entity name in singular form (if it is not domain event, the name of process should be used or just event name)

For example topics in bugfest (ENV  == bugfest) the Kafka topic for inventory instances for tenant fs09000000 should have the following name:

bugfest.fs09000000.inventory.instance

We should also consider cases with many producers and the only consumer, the best example of which is mod-audit - there can be many modules pushing their events for audit, and only the mod-audit consuming them. We can follow exactly the same convention as stated above, e.g. bugfest.fs09000000.mod-inventory-storage.audit, or can specify not producer module name but consumer module name instead, e.g. bugfest.fs09000000.mod-audit. (this might be more confusing).

If the ENV variable is not defined for the environment, the module there should be a fallback to exclude it from topic name.

Topic partitioning

In order not to have problems with consistency, that can occur due to race condition in case of concurrent writes to database Kafka topics should have appropriate partition_key. It could be id of the record or some another value, that allow to segregate events between consumer instances.

When multiple instances of the consumer modules are deployed, the same consumer group should be set for all of them.

Domain events json schema

Module sends notifications, when there is any change of domain entities (e.g. instance, holding, item, loans).

The pattern means that every time when an entity is created/updated/removed a message is posted to Kafka topic:

The event payload should have the following structure:

{
"old": {...}, // the entity state before update or delete
"new": {...}, // the entity state update or create
"type": "UPDATE|DELETE|CREATE|DELETE_ALL", // type of the event
"tenant": "diku" // tenant id
}

X-Okapi-Url and X-Okapi-Tenant headers could be set from the request to the Kafka message.

(question) RA: I'm also thinking about versioning of such schema, just to cover potential extension in it and avoid version compatibility issues.

Domain events for delete all APIs

In order to clean data for some tenant. There could be delete all APIs for records. For such APIs we're issuing a special domain event:

  • Partition key: 00000000-0000-0000-0000-000000000000
  • Event payload:

{ "type": "DELETE_ALL", "tenant": "<the tenant name>" }

Open items and questions

Below is the list of other items to be clarified

  • Retention policy recommendations - should we recommend cross-platform standard for that?
  • Obsolete topic deletion - should exist a mechanism of automated deletion for obsolete topics? How to determine obsolescence?
  • Events type cataloging - as per our separate discussion with Jakub Skoczen - "it would be great ... to specify how the Producers (and possibly Consumers) will catalog what kind of events they publish. I am assuming that the number of events will grow quite quickly once this proposal becomes a feature in FOLIO. I would propose two alternatives:
    • extend the ModuleDescriptor with a section on "events" or "messages", ensure that this section is kept up to date with the type of events published in a specific version of a given module
    • introduce a new type of descriptor, e.g MessageDescriptor or EventDescriptor, that will capture this information"
  • No labels