Skip to end of banner
Go to start of banner

Switching from PubSub to Direct Kafka approach

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

DRAFT! DRAFT! DRAFT! THIS IS NOT A PROPOSAL AT THE MOMENT

Introduction

This page is intended to analyze the current experience of using the PubSub mechanism, in particular in the Circulation application, and to study the feasibility of moving to the Direct Kafka approach.

Additional information on the topic

Requirements for the mechanism of interaction between modules

(question) Can we refer to PubSub. The second round. ? One need discuss requirements in more detail. Can be also linked to the results of recent QAW.

Reliability

  • Delivery must be guaranteed with at least one (question) approach
    • Firebird - check if they can handle at-least-one while processing circulation log (UUID field, or hash on a set of key fields)

Performance

AFAIK, pubsub's performance was an issue for data import. To my knowledge, Vega didn't have any issues with pubsub performance because we're mainly using it for events triggered by users manually (check-out, manual fee/fine charge etc.). But it can be a potential issue for libraries that use fixed schedules and have thousands of loans "ageing to lost" or being charged a fine for at the same time.

Folijet - what are the performance requirements? (question)

Retention policy

nothing specific, default should be enough

Versioning

It should be here

Payload size

Assumption: up to 100 Kb

Vega - small jsons (less than 1 Kb)

Folijet - small jsons

Firebird (Remote Storage) - (question)

The existing scheme of modules interaction through PubSub

The figure below shows the participants in the transmission of a single event between two modules - Publisher and Subscriber - using the PubSub approach. (warning) A bit more details can be provided here

Benefits

This is the list of benefits PubSub approach provides:

  • modules decoupling - for interaction and transmission of messages, standard calls to OkApi are used
  • the potential to replace the underlying transport mechanism from Kafka with something else without having to refactor the client modules (i.e. the modules that use this PubSub)
  • ability to use FOLIO permissions to control access

Known limitations and issues

The description of known issues is based on production experience with PubSub in mod-circulation, mod-feesfines, mod-patron-blocks, as well as results from performance testing mod-pubsub performance testing ((question) is this still valid? could be some work to improve performance and reliability)
(warning) Note. - it looks like I (Raman A) don't have access to the modpubsub project in Jira.

Most common pubsub issues Vega has faced:

  • missing Okapi permissions during calls from/to PubSub
  • issues with the special user PubSub creates and uses for its purposes (missing user, missing credentials, missing permissions, etc.)
  • missing modules' publisher/subscriber registration in PubSub
  • after failing to deliver an event (for any reason, including consumer's fault) PubSub just keeps delivering other events from the same topic and modules keep consuming them which ruins data consistency. Such issues very often go unnoticed for months and after that it can be hard to reproduce or find the reason of the initial delivery failures. On top of that, additional work is required to create syncing mechanisms to fix the data consistency issue.

Pubsub issues are notoriously time-consuming and hard to investigate. Mostly because they are usually invisible to the end user. When an event or a series of events can't reach the intended subscriber, libraries rarely notice this immediately, but rather when data inconsistencies caused by undelivered events manifest themselves elsewhere. Consider the following real-life scenario:

  • maximum number of loans per user is limited to 10 by automated patron blocks configuration
  • a user has no open loans at the moment
  • user checks out an item, but ITEM_CHECKED_OUT event does NOT reach mod-patron-blocks (which keeps count of loans for every user)
  • over the next few months user checks out 10 more items, each time a corresponding event reaches mod-patron-blocks successfully
  • library notices that user has 11 open loans, while the the limit is 10
  • library reports a bug in mod-patron-blocks - the most likely culprit from user's perspective
  • during investigation a developers discovers that the block was not imposed because of a failed event delivery which took place months ago

The proposed scheme of modules interaction through Direct Kafka

In the case of Direct Kafka approach, OkApi and PubSub are no longer required, modules A and B interact directly with Kafka:

Requirements Addressing

(question) how Kafka will help solve issues and achieve requirements

  • It won't deliver newer messages until older ones are acknowledged which will help with data consistency.
  • Kafka at-least-once semantic allows to address reliability
  • Better performance, though it wasn't a problem in our case.
  • enabling good HA since every new instance connects Kafka withing consumer groups, with good distribution of events

(question) it will be good for customers, because fewer bugs (even if not fewer, they will be easier to investigate and won't cause data inconsistency) - fail fast!


Configuration is more complex

Limitations, Risks and Assumptions

  • All modules involved will have a Kafka client and "know" that Kafka is being used as the transport mechanism. As a result, if it is necessary to move to another transport mechanism in the indefinite future, changes will be required in all the modules involved. This risk can be mitigated by placing all the logic required to work through Direct Kafka in a separate library with designated interfaces. In this case, the logic of interaction through Direct Kafka will, in a sense, still be hidden from the business logic of the modules involved. folio-kafka-wrapper - is it for RMB? for Spring it should be much easier
  • There's no implemented approach to authorize events in Kafka - it will be required to follow some general solution when it'll be made...

Modules affected

(question) Need to list modules participating in Circulation where refactoring will be required.

  • mod-circulation - SOURCE / CONSUMER (from mod-feesfines)
  • mod-feesfines (Vega) - SOURCE
  • mod-patron-blocks (Vega) - CONSUMER
  • mod-audit (Firebird) - CONSUMER
  • mod-remote-storage (Firebird) - CONSUMER (question)

Time and effort estimates

(warning) Need to think also how behavior after implementation can be tested / validated.


Is it possible to tune PubSub in order to cover all the needs? - No biz value in making another Kafka; it's not a trivial task to make guaranteed delivery on HTTP



  • No labels