EUREKA-732 - async entitlement steps status tracking implementation plan

EUREKA-732 - async entitlement steps status tracking implementation plan

EUREKA-732: Spike - Define implementation/approach for async mgr-tenant-entitlements feedback loopClosed

Current logic overview

Kafka messages sent by MTE

Manager Tenant Entitlements sends 4 types of Kafka messages during Eureka entitlement process, all of which are specific to a module.

Class

Topic

Mandatory

Payload

Class

Topic

Mandatory

Payload

CapabilitiesModuleEventPublisher

<env>.<tenant>.capability

Yes

ResourceEvent:

  • String id

  • UPDATE | CREATE | DELETE | DELETE_ALL type

  • String tenant

  • String resourceName = Capability

  • CapabilityEventPayload newValue

  • CapabilityEventPayload oldValue

ScheduledJobModuleEventPublisher

<env>.<tenant>.scheduled-job

No

ResourceEvent:

  • String id

  • UPDATE | CREATE | DELETE | DELETE_ALL type

  • String tenant

  • String resourceName = Scheduled Job

  • ScheduledTimers newValue

  • ScheduledTimers oldValue

SystemUserModuleEventPublisher

<env>.<tenant>.system-user

No

ResourceEvent:

  • String id

  • UPDATE | CREATE | DELETE | DELETE_ALL type

  • String tenant

  • String resourceName = System user

  • SystemUserEvent newValue

  • SystemUserEvent oldValue

SystemUserEvent:

  • String name - module name that become system user name

  • String type - user type

  • List<String> permission

FolioModuleEventPublisherEntitlementEventPublisher

<env>.entitlement

Yes

EntitlementEvent:

  • String type - ENTITLE | REVOKE

  • String moduleId

  • String tenantName

  • UUID tenantId

Each of the *EventPublisher classes is a flow stage. Code-wise they extned ModuleDatabaseLoggingStage class, which means their execution information is stored in DB in flow_stage table - which will be described below.

Some of the events may not need to be sent - for example, if module descriptor doesn’t specify a system user there will be no system user event sent. Same for scheduled jobs - module descriptor may not specify any, in which case there will be no Kafka event sent for this module with scheduled timers. Therefore these events are not mandatory.

All the events except entitlement events are triggering an asynchronous part of the entitlement flow, completion of which needs to be tracked by MTE. Conveniently, all corresponding *EventPublisher classes except FolioModuleEventPublisher extend a common AbstractModuleEventPublisher parent class and publish a kind of ResourceEvent - a generic resource event class with certain common fields.

Entitlement flow status and stages

Entitlement flow status is stored in flow DB table status column, and is set to FINISHED by FinishedFlowFinalizer, which is the last step of every flow. Primary key in flow table is flow_id - a GUID that is generated for each new flow. Also the flow table has tenant_id field, which will allow us to find an appropriate flow later when we need to update its status.

Note: For cancellation and error cases there are FailedFlowFinalizer, CancelledFlowFinalizer and CancellationFailedFlowFinalizer - which we don’t need to touch or change.

For individual flow stages information there is a flow_stage table, which has combined primary key of two columns - flow_id (the GUID ID of a flow from flow table ) and a string (varchar) stage . Examples of stage values:

  • mod-notes-5.1.0-scheduledJobModuleEventPublisher

  • mod-notes-5.1.0-capabilitiesModuleEventPublisher

  • mod-users-19.2.2-folioModuleEventPublisher

  • mod-password-validator-3.1.0-systemUserModuleEventPublisher

As one can see, stage names are generated as moduleId + ”-” + <stage class name> - the implementation is a part of ModuleDatabaseLoggingStage class, which is a parent class for all the above-mentioned *EventPublisher classes.

Implementation plan

Overview

In order to track asynchronous steps results, we need a Kafka topic onto which downstream modules will push events regarding completion of their part of entitlement process - whether successful or unsuccessful - with indication of an even type (e.g. system user creation), module ID and a tenant ID.

MTE will listen on this topic and update corresponding flow information in DB accordingly. We do not expect more than one entitlement for same module to happen for same tenant in parallel, therefore we can identify corresponding flow by picking one that is awaiting completion and has a matching tenant ID.

Within that flow we can pick a corresponding stage by an event type and module ID.

We should introduce a new table async_entitlement_task that will hold entitlement async tasks records, which will be populated once an event is sent by MTE, and mark these as completed while corresponding confirmation event is received by MTE. When there are no more records for a given flow in that table that aren’t marked as completed, we can update flow status to FINISHED.

Given that events may not even be sent by MTE - say, in case module does not define a system user there will be no system user event - we need to populate a record in async_entitlement_task table only if the message was sent (which is easy to do using a common parent superclass of corresponding event publishing classes in MTE).

Implementation steps

  1. In Manager-Tenant-Entitlements, when sending Kafka messages for capabilities, scheduled timers or system users (e.g. in AbstractModuleEventPublisher ) - save corresponding task into a async_entitlement_task DB table, with these fields: flow ID, task type (capabilities, timers, system_user), module ID, completed = false, success = null. Note: don’t save a record into DB if no message is sent - e.g. createEvent returns an empty Optional.

  2. In successful flow finalizer (FinishedFlowFinalizer ) verify if there is at least 1 non-completed async_entitlement_task (e.g. record with corresponding flow ID where completed == false). If yes, set flow status to AWAITING_COMPLETION. Otherwise set it to FINISHED.

  3. In Mod-Roles-Keycloak, Mod-Users-Keycloak and Mod-Scheduler - send Kafka events with fields module ID, tenant ID, task type, success and error info to a new topic entitlement_task_results (or similar name) once they’re done doing their part of the work. More details provided below.

  4. In Manager-Tenant-Entitlements, have a new Kafka event listener that listens to the entitlement_task_results Kafka topic, and upon receiving an event - finds corresponding in-progress flows for given tenant, if there is more than one of them - selects one that has a flow_stage record with a given module ID and a given event type (e.g. stage = mod-notes-5.1.0-capabilitiesModuleEventPublisher for module ID mod-notes-5.1.0 and event type capabilities).
    Then finds a corresponding task in the async_entitlement_task table and marks it as completed accordingly (minding the success or error status, populate error details if necessary).
    Lastly, verifies if there is at least 1 non-completed async_entitlement_task (e.g. record with corresponding flow ID where completed == false). If there aren’t any, sets flow status to FINISHED or FAILED based whether all tasks with that flow ID were marked as successfully completed, or at least one has failed.

  5. Extend flow details API endpoint in Manager-Tenant-Entitlements to return information about async tasks based on records in async_entitlement_task - return info on pending and failed tasks, optionally also completed tasks (may be controlled via URL parameter).

Changes to Manager-Tenant-Entitlements

First and foremost we need to add a DB table async_entitlement_task. Here is the proposed table structure:

  • primary key part: flow ID - UUID

  • primary key part: module ID - string (e.g. mod-notes-5.1.0)

  • primary key part: event type - capabilities, timers (or scheduled_jobs), system_user

  • completed - true/false

  • success - true/false

  • details - String, populated with error details

Every time we send a ResourceEvent to Kafka we need to create a corresponding record in async_entitlement_task table - as described above.

Then we need to modify FinishedFlowFinalizer as described above - verify if there were any events submitted as recorded in async_entitlement_task table, and update flow status accordingly - to AWAITING_COMPLETION if there were some, otherwise to FINISHED status.

To finish the event handling part, we need to create a Kafka event listener for the entitlement_task_results Kafka topic. Following fields are proposed for events in this topic:

  • tenant ID - UUID

  • module ID - string (e.g. mod-notes-5.1.0)

  • event type - capabilities, timers (or scheduled_jobs), system_user

  • success - true/false

  • details - String, populated with error details

Once such event is received, we need to find corresponding flow in DB - find all flows with status AWAITING_COMPLETION for given tenant, then look for the one that has flow_stage record with a given module ID and a given event type.

When flow is selected, find a record in async_entitlement_task table by flow ID + module ID + event type, and mark it as completed with success or with failure - e.g. update completed to true, and success to true or false correspondingly. In case of error, also update details column with error details. Once that update is done, we need to verify if the flow is complete - e.g. if there are still any async_entitlement_task records for this flow with completed == false, or not. If all tasks are completed, we need to check if at least one failed (e.g. success == false), and based on that update flow status to FINISHED or FAILED.

Lastly, we need to modify flow REST API in MTE to return information about flow tasks, so that it would be possible to verify which tasks are still pending, which have failed, what are error details etc.

Changes to Mod-Roles-Keycloak

Mod-Roles-Keycloak uses org.folio.roles.integration.kafka.KafkaMessageListener class handleCapabilityEvent method to process Kafka event. Before returning, this method should post corresponding Kafka message.

Notice, however, that there may be some asynchronous parts of the flow in Mod-Roles-Keycloak done with the use of ApplicationEventPublisher, events published by which are picked up by CapabilityEventHandler and CapabilitySetEventHandler. They seem to be used only for update and delete events so far - but these need to be investigated.

Worst case - we need to await completion of handling of those application events, and only then post the entitlement acknowledgement Kafka message. This will be somewhat hard to do, because there can be more than 1 application event per module entitlement, or none. Thus we will need a unique entitlement ID in context tied to each application event, and a service that keeps count of those application events per entitlement. We then need to reduce the count once the message is processed, and post Kafka message when the count reaches zero.

Changes to Mod-Users-Keycloak

First and foremost, SystemUserEvent needs to also have moduleId field added to it and populated in MTE during sending.

Then in Mod-Users-Keycloak send entitlement acknowledgement message from org.folio.uk.integration.kafka.KafkaMessageListener class handleSystemUserEvent method.

Changes to Mod-Scheduler

Send entitlement acknowledgement message from org.folio.scheduler.integration.kafka.KafkaMessageListener class handleScheduledJobEvent method.

Resulting sequence

Happy path:

image-20250415-154156.png

Error case:

image-20250415-154533.png