EUREKA-732 - async entitlement steps status tracking implementation plan
Current logic overview
Kafka messages sent by MTE
Manager Tenant Entitlements sends 4 types of Kafka messages during Eureka entitlement process, all of which are specific to a module.
Class | Topic | Mandatory | Payload |
---|---|---|---|
| <env>.<tenant>.capability | Yes | ResourceEvent:
|
| <env>.<tenant>.scheduled-job | No | ResourceEvent:
|
| <env>.<tenant>.system-user | No | ResourceEvent:
SystemUserEvent:
|
| <env>.entitlement | Yes | EntitlementEvent:
|
Each of the *EventPublisher
classes is a flow stage. Code-wise they extned ModuleDatabaseLoggingStage
class, which means their execution information is stored in DB in flow_stage
table - which will be described below.
Some of the events may not need to be sent - for example, if module descriptor doesn’t specify a system user there will be no system user event sent. Same for scheduled jobs - module descriptor may not specify any, in which case there will be no Kafka event sent for this module with scheduled timers. Therefore these events are not mandatory.
All the events except entitlement events are triggering an asynchronous part of the entitlement flow, completion of which needs to be tracked by MTE. Conveniently, all corresponding *EventPublisher
classes except FolioModuleEventPublisher
extend a common AbstractModuleEventPublisher
parent class and publish a kind of ResourceEvent
- a generic resource event class with certain common fields.
Entitlement flow status and stages
Entitlement flow status is stored in flow
DB table status
column, and is set to FINISHED
by FinishedFlowFinalizer
, which is the last step of every flow. Primary key in flow
table is flow_id
- a GUID that is generated for each new flow. Also the flow
table has tenant_id
field, which will allow us to find an appropriate flow later when we need to update its status.
Note: For cancellation and error cases there are FailedFlowFinalizer
, CancelledFlowFinalizer
and CancellationFailedFlowFinalizer
- which we don’t need to touch or change.
For individual flow stages information there is a flow_stage
table, which has combined primary key of two columns - flow_id
(the GUID ID of a flow from flow
table ) and a string (varchar) stage
. Examples of stage values:
mod-notes-5.1.0-scheduledJobModuleEventPublisher
mod-notes-5.1.0-capabilitiesModuleEventPublisher
mod-users-19.2.2-folioModuleEventPublisher
mod-password-validator-3.1.0-systemUserModuleEventPublisher
As one can see, stage names are generated as moduleId + ”-” + <stage class name>
- the implementation is a part of ModuleDatabaseLoggingStage
class, which is a parent class for all the above-mentioned *EventPublisher
classes.
Implementation plan
Overview
In order to track asynchronous steps results, we need a Kafka topic onto which downstream modules will push events regarding completion of their part of entitlement process - whether successful or unsuccessful - with indication of an even type (e.g. system user creation), module ID and a tenant ID.
MTE will listen on this topic and update corresponding flow information in DB accordingly. We do not expect more than one entitlement for same module to happen for same tenant in parallel, therefore we can identify corresponding flow by picking one that is awaiting completion and has a matching tenant ID.
Within that flow we can pick a corresponding stage by an event type and module ID.
We should introduce a new table async_entitlement_task
that will hold entitlement async tasks records, which will be populated once an event is sent by MTE, and mark these as completed while corresponding confirmation event is received by MTE. When there are no more records for a given flow in that table that aren’t marked as completed, we can update flow status to FINISHED
.
Given that events may not even be sent by MTE - say, in case module does not define a system user there will be no system user event - we need to populate a record in async_entitlement_task
table only if the message was sent (which is easy to do using a common parent superclass of corresponding event publishing classes in MTE).
Implementation steps
In Manager-Tenant-Entitlements, when sending Kafka messages for capabilities, scheduled timers or system users (e.g. in
AbstractModuleEventPublisher
) - save corresponding task into aasync_entitlement_task
DB table, with these fields: flow ID, task type (capabilities, timers, system_user), module ID, completed = false, success = null. Note: don’t save a record into DB if no message is sent - e.g.createEvent
returns an emptyOptional
.In successful flow finalizer (
FinishedFlowFinalizer
) verify if there is at least 1 non-completedasync_entitlement_task
(e.g. record with corresponding flow ID where completed == false). If yes, set flow status toAWAITING_COMPLETION
. Otherwise set it toFINISHED
.In Mod-Roles-Keycloak, Mod-Users-Keycloak and Mod-Scheduler - send Kafka events with fields module ID, tenant ID, task type, success and error info to a new topic
entitlement_task_results
(or similar name) once they’re done doing their part of the work. More details provided below.In Manager-Tenant-Entitlements, have a new Kafka event listener that listens to the
entitlement_task_results
Kafka topic, and upon receiving an event - finds corresponding in-progress flows for given tenant, if there is more than one of them - selects one that has aflow_stage
record with a given module ID and a given event type (e.g.stage
=mod-notes-5.1.0-capabilitiesModuleEventPublisher
for module IDmod-notes-5.1.0
and event type capabilities).
Then finds a corresponding task in theasync_entitlement_task
table and marks it as completed accordingly (minding the success or error status, populate error details if necessary).
Lastly, verifies if there is at least 1 non-completedasync_entitlement_task
(e.g. record with corresponding flow ID where completed == false). If there aren’t any, sets flow status toFINISHED
orFAILED
based whether all tasks with that flow ID were marked as successfully completed, or at least one has failed.Extend flow details API endpoint in Manager-Tenant-Entitlements to return information about async tasks based on records in
async_entitlement_task
- return info on pending and failed tasks, optionally also completed tasks (may be controlled via URL parameter).
Changes to Manager-Tenant-Entitlements
First and foremost we need to add a DB table async_entitlement_task
. Here is the proposed table structure:
primary key part: flow ID - UUID
primary key part: module ID - string (e.g.
mod-notes-5.1.0
)primary key part: event type - capabilities, timers (or scheduled_jobs), system_user
completed - true/false
success - true/false
details - String, populated with error details
Every time we send a ResourceEvent
to Kafka we need to create a corresponding record in async_entitlement_task
table - as described above.
Then we need to modify FinishedFlowFinalizer
as described above - verify if there were any events submitted as recorded in async_entitlement_task
table, and update flow status accordingly - to AWAITING_COMPLETION
if there were some, otherwise to FINISHED
status.
To finish the event handling part, we need to create a Kafka event listener for the entitlement_task_results
Kafka topic. Following fields are proposed for events in this topic:
tenant ID - UUID
module ID - string (e.g.
mod-notes-5.1.0
)event type - capabilities, timers (or scheduled_jobs), system_user
success - true/false
details - String, populated with error details
Once such event is received, we need to find corresponding flow in DB - find all flows with status AWAITING_COMPLETION
for given tenant, then look for the one that has flow_stage
record with a given module ID and a given event type.
When flow is selected, find a record in async_entitlement_task
table by flow ID + module ID + event type, and mark it as completed with success or with failure - e.g. update completed to true, and success to true or false correspondingly. In case of error, also update details column with error details. Once that update is done, we need to verify if the flow is complete - e.g. if there are still any async_entitlement_task
records for this flow with completed == false, or not. If all tasks are completed, we need to check if at least one failed (e.g. success == false), and based on that update flow status to FINISHED
or FAILED
.
Lastly, we need to modify flow REST API in MTE to return information about flow tasks, so that it would be possible to verify which tasks are still pending, which have failed, what are error details etc.
Changes to Mod-Roles-Keycloak
Mod-Roles-Keycloak uses org.folio.roles.integration.kafka.KafkaMessageListener
class handleCapabilityEvent
method to process Kafka event. Before returning, this method should post corresponding Kafka message.
Notice, however, that there may be some asynchronous parts of the flow in Mod-Roles-Keycloak done with the use of ApplicationEventPublisher
, events published by which are picked up by CapabilityEventHandler
and CapabilitySetEventHandler
. They seem to be used only for update and delete events so far - but these need to be investigated.
Worst case - we need to await completion of handling of those application events, and only then post the entitlement acknowledgement Kafka message. This will be somewhat hard to do, because there can be more than 1 application event per module entitlement, or none. Thus we will need a unique entitlement ID in context tied to each application event, and a service that keeps count of those application events per entitlement. We then need to reduce the count once the message is processed, and post Kafka message when the count reaches zero.
Changes to Mod-Users-Keycloak
First and foremost, SystemUserEvent needs to also have moduleId field added to it and populated in MTE during sending.
Then in Mod-Users-Keycloak send entitlement acknowledgement message from org.folio.uk.integration.kafka.KafkaMessageListener
class handleSystemUserEvent
method.
Changes to Mod-Scheduler
Send entitlement acknowledgement message from org.folio.scheduler.integration.kafka.KafkaMessageListener
class handleScheduledJobEvent
method.
Resulting sequence
Happy path:
Error case: