...
Requirements for the mechanism of interaction between modules
Can we refer to PubSub. The second round. ? One need discuss requirements in more detail. Can be also linked to the results of recent QAW.
Reliability
- Delivery must be guaranteed with at least one approach
- Firebird - check if they can handle at-least-one while processing circulation log (UUID field, or hash on a set of key fields)
Performance
AFAIK, pubsubThe requirements below are based on current understanding and requirements from specific teams and modules using PubSub.
Reliability
Message delivery must be guaranteed. Message loss is unacceptable.
Performance
PubSub's performance was an issue for data import. To my knowledge, Vega Data Import. Vega team didn't have any issues with pubsub PubSub performance because we're mainly using the team mainly use it for events manually triggered by users manually (check-out, manual fee/fine charge etc.). But it can be a potential issue for libraries that use fixed schedules and have thousands of loans "ageing to lost" or being charged a fine for at the same time.
Folijet - what are the performance requirements?
Retention policy
It seems nothing specific, default should be enough
Versioning
It should be here.
Payload size
Assumption: up to 100 Kb
Vega - small jsons Large payload sizes is not expected. For Circulation flows (Vega team) this are small json structures (less than 1 Kb)
Folijet - small jsons
, for Folijet it is the same. For Firebird (Remote Storage) - .
Therefore, one can assume that the payload size is expected to be up to 100 Kb.
The existing scheme of modules interaction through PubSub
...
The description of known issues is based on production experience with PubSub in mod-circulation, mod-feesfines, and mod-patron-blocks, as well as results from performance testing mod-pubsub performance testing ( is this still valid? could be some work to improve performance and reliability)
Note. - it looks like I (Raman A) don't have access to the modpubsub project in Jira.it should be noted that this testing was conducted some time ago, and apparently there is no more recent data)
Most common pubsub issues Vega has faced:
...
Drawio | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Requirements Addressing
how Kafka will help solve issues and achieve requirements
...
Below the key benefits are listed:
- Guaranteed delivery provided by Kafka allows addressing reliability concern
- Improved data consistency since Kafka does not deliver newer messages until older ones are acknowledged which will help with data consistency.
- Kafka at-least-once semantic allows to address reliability
- Better performance, though it wasn't a problem in our case.
- enabling
- Better performance by eliminating the overhead of multiple HTTP calls per event dispatch
- Enabling good HA since every new Event Consumer instance connects Kafka withing within a consumer groups, with good distribution of events
it will be good for customers, because fewer bugs (even if not fewer, they will be easier to investigate and won't cause data inconsistency) - fail fast!
...
- group, so that the load is distributed evenly
- Improved manageability because of easier investigation capabilities, less data inconsistency, and following fail-fast approach
Known challenges:
- Configuration (including Kafka, topics, group consumer, authorization) is more complicated than with PubSub
- While Kafka supports exactly-once delivery, the at-least-once implementation is simpler and more manageable. In turn, at-least-once means that the Event Consumer must be prepared to handle potential duplicate events
Limitations, Risks and Assumptions
- All modules involved will have a Kafka client and "know" that Kafka is being used as the transport mechanism. As a result, if it is necessary to move to another transport mechanism in the indefinite future, changes will be required in all the modules involved.
- This risk can be partially mitigated by placing all the logic required to work through Direct Kafka in a separate library with designated interfaces. In this case, the logic of interaction through Direct Kafka will, in a sense, still be hidden from the business logic of the modules involved. Note: there is folio-kafka-wrapper
- which provides some useful functionality; for Spring-way it should be much easier
- ThereAt the moment there's no implemented approach to authorize events in address security concerns (including authorization) for Kafka - it will be required to follow some general solution when it'll be made...
Modules affected
Need to list Below is the list of modules participating in Circulation where refactoring will be required.:
Module name | Owning team | Is it a Producer or Consumer of events? |
---|---|---|
mod-circulation |
...
Vega | Producer (in a number of flows), Consumer (for events from mod-feesfines) | |
mod-feesfines |
...
Vega | Producer | |
mod-patron-blocks |
...
Vega | Consumer | |
mod-audit |
...
Firebird | Consumer | |
mod-remote-storage |
...
Firebird | Consumer |
Wouldn't it make sense to tune the PubSub instead of switching to Direct Kafka?
A legitimate question is whether it is possible to refine and expand the capabilities of the PubSub in order to address the problems listed before, and whether this will be more (or less) efficient than switching to the Direct Kafka approach.
Most likely, such an improvement of the PubSub is possible. Although detailed elaboration has not been carried out, it can be assumed that the following will be required: persistent storage of events, reliable management of this storage, tracking of delivered/undelivered status, processing of delivery confirmations, and a number of others.
In fact, this means a lot of the same functionality that Kafka already provides. At the same time, the PubSub is not a key product for Folio, but only a transport mechanism.
Therefore, it seems more appropriate and efficient to use existing solutions from the market (in this case, Apache Kafka), and focus FOLIO development efforts on business value.
Time and effort estimates
Need to think also how behavior after implementation can be tested / validated.Is it possible to tune PubSub in order to cover all the needs? - No biz value in making another Kafka; it's not a trivial task to make guaranteed delivery on HTTP