Info |
---|
DRAFT! DRAFT! DRAFT! THIS IS NOT A PROPOSAL AT THE MOMENT |
...
Message delivery must be guaranteed. Message loss is unacceptable.
Performance
PubSub's performance was an issue for Data Import. Vega team didn't have any issues with PubSub performance because the team mainly use it for events manually triggered by users (check-out, manual fee/fine charge etc.). But it can be a potential issue for libraries that use fixed schedules and have thousands of loans "ageing to lost" or being charged a fine for at the same time.
Folijet - what are the performance requirements? The PubSub's performance was an issue for Data Import. However, it does not seem that the requirements were explicitly fixed somewhere. According to the development team, "the reuirements were once voiced a long time ago", but I have not yet been able to find any documents. As for the current performance, the development team checks each dev cycle before releasing on a perf rancher with no background activity (for Morning Glory, the results are collected here Folijet - Morning Glory Snapshot Performance testing), plus the PTF team measures the performance of each release on their environments with background activity including (here is the report Data Import Test report (Lotus)).
Retention policy
It seems nothing specific.
...
Large payload sizes is not expected. For Circulation flows (Vega team) this are small json structures (less than 1 Kb), for Folijet it is the same. For Firebird (Remote Storage) - usually each message transmits information on one item though in some cases there can be a batch of several items; There are no measurements from production systems, but according to the development team, the size of messages can be up to 10 Kb (for one item) or up to 100 Kb (for several items in a message).
Therefore, one can assume that the payload size is expected to be up to 100 Kb.
...
- maximum number of loans per user is limited to 10 by automated patron blocks configuration
- a user has no open loans at the moment
- user checks out an item, but ITEM_CHECKED_OUT event does NOT reach mod-patron-blocks (which keeps count of loans for every user)
- over the next few months user checks out 10 more items, each time a corresponding event reaches mod-patron-blocks successfully
- library notices that user has 11 open loans, while the the limit is 10
- library reports a bug in mod-patron-blocks - the most likely culprit from user's perspective
- during investigation a developers discovers that the block was not imposed because of a failed event delivery which took place months ago
Consequences of the Push mechanism while Data Import
The existing PubSub is a Push mechanism. Source Record Manager would place large numbers of messages (one per record) into the queue during a large import job. Mod-pubsub would then push these into the callback function provided by mod-inventory. There was no means for mod-inventory to say “enough already”, it would get overloaded and crash. This was discussed with Folijet previously, and no viable solution was found.
The proposed scheme of modules interaction through Direct Kafka
...
- Guaranteed delivery provided by Kafka allows addressing reliability concern
- Improved data consistency since Kafka does not deliver newer messages until older ones are acknowledged
- Better performance by eliminating the overhead of multiple HTTP calls per event dispatch
- Enabling good HA since every new Event Consumer instance connects Kafka within a consumer group, so that the load is distributed evenly
- Improved manageability because of easier investigation capabilities, less data inconsistency, and following fail-fast approach
- the Pull mechanism provided by the Direct Kafka (as implemented in Data Import) - this implementation places the consumer code in mod-inventor,y and it will pull message from Kafka when it has capacity
Known challenges:
- Configuration (including Kafka, topics, group consumer, authorization) is more complicated than with PubSub
- While Kafka supports exactly-once delivery, the at-least-once implementation is simpler and more manageable. In turn, at-least-once means that the Event Consumer must be prepared to handle potential duplicate events
...
Time and effort estimates
Need to think also how behavior after implementation can be tested / validatedRequired efforts can be divided into two groups:
- switching to folio-kafka-wrapper, reusing its capabilities and independently implementing missing functionality (in terms of creating and configuring topics, for example); a small spike-story will help to better understand the size of this group
- transfer of all modules to the event-approach - a simpler and more understandable activity, because the essence of the process is the same everywhere.
Quite T-shirt estimates - L->XXL.