[DRAFT]
Spike Overview
User Story: https://folio-org.atlassian.net/browse/EUREKA-211 Spike - Revisit routing of mod-scheduler requests
Objective: Revisit the decisions made in the wake of - EUREKA-89Getting issue details... STATUS
See spike findings here: https://folio-org.atlassian.net/wiki/x/BACNCg
The reason to revisit this is that Kong may not know how to route requests to these system interfaces in cases where multiple modules have timers that call the same system interface. It’s also not great that Kong is even aware of system interfaces, given that it only routes traffic from outside the system.
We may want to consider alternative solutions that avoid these problems. For example, we could make the mod-scheduler sidecar aware of all discovery and other relevant info or be able to retrieve it on an as-needed basis.
Problem Statement
Kong's inability to route requests correctly to system interfaces and its awareness of these interfaces are problematic. This routing confusion arises when multiple modules' timers call the same system interface, which is beyond Kong's intended functionality of managing external traffic.
Scope
This spike will explore alternative solutions to the current design to improve request routing for system interfaces. Potential approaches include enhancing the mod-scheduler sidecar to be aware of or retrieve necessary discovery and other relevant information dynamically. The goal is to devise a solution that bypasses the limitations faced by Kong in its current role.
Deliverables
The main idea is to work in detail on option 3, described in the EUREKA-89 Spike - Design solution for scheduled system calls.
There are three possible solutions here, but all of them originate from the same ideas: mgr-applications and mgr-tenant-entitlements contain all the information to identify a correct route that should be used for a particular “timer call” from the mod-scheduler. So, these “timer“ routes should not be configured in Kong, and all the discovery information could be retrieved dynamically or available when it is needed.
Option 3.1
In general, this option repeats the idea of security and principles described in option 1.2 of the spike https://folio-org.atlassian.net/wiki/x/BACNCg, but eliminates the Kong from the timer flows.
The following diagram displays the main components and actors involved in the flow.
Description
This option introduces a new component, mgr-discovery, that acts as a facade for the Sidecars and provides a single point of interaction between them and the mgr-* components. Sidecar will use mgr-discovery to read all bootstrap info related to the FOLIO module the Sidecar serves and dynamically retrieve discovery information that will be used for timer routings.
Timer Flow
mod-scheduler prepares a request and puts impersonated token for the system user into
x-okapi-token
request header. Then the request goes from mod-scheduler to its sidecar.Because Sidecar (mod-scheduler) has dynamic routing enabled and the requested route is not known, it calls the mgr-discovery to resolve the discovery information.
mgr-discovery calls mgr-Tenant-Entitlements to resolve the actual module's version.
mgr-discovery calls mgr-Applications to retrieve the actual discovery information that points to the Sidecar (Module A).
Sidecar (mod-scheduler) dynamically routes the request to the Sidecar (Module A).
Sidecar (Module A) authorizes the request to Module A.
get a token from a request header.
parse the token.
call Keycloak to evaluate permissions. Since the system user has access to all resources, authorization will be successful.
Sidecar (Module A) forwards the request to Module A for processing.
Key Functions
Bootstrap info reading:
When a request for bootstrap data is received, the module mgr-discovery performs calls to mgr-Tenant-Entitlements and mgr-Applications to collect and compile all the information needed for the Sidecar.
Routes Resolution and Cache with Expiration Policy:
Purpose: The module mgr-discovery maintains a cache of routes that includes information about various system/timer interfaces and their respective routes. This cache has an expiration policy to ensure that the information remains up-to-date and valid.
Functionality: When a request for a specific route (e.g.,
/timer-urlA
) is received, the module mgr-discovery checks its cache for the corresponding route. If the route information is available and valid, it proceeds to use it. If the information is expired or not found, the module queries mgr-Tenant-Entitlements and mgr-Applications to fetch and update the route information.
Lightweight and Simple: the mgr-discovery module does not use any persistent storage and integrates only with
mgr-Tenant-Entitlements
andmgr-Applications
Single Integration Point for Sidecars: Sidecars will interact with the mgr-discovery only, which will simplify the Sidecar implementation as it no longer has to integrate with different mgr-* modules.
Pros
Kong is excluded from the timer workflow. The system/timer interfaces are not configured in Kong and are not available from the outside.
The implementation is relatively simple. it will affect only the sidecar and a new mgr-discovery module
The Sidecar’s codebase will be simplified as it will interact with the mgr-discovery module only and the rest of the complex logic will be encapsulated in the mgr-discovery
No need to track the timer interface changes as the discovery info will be resolved dynamically and stored in the cache with a short retention time.
No need for the sidecars to get ALL discovery information, only what’s required, when it’s required.
Protects timer interfaces in a standard way via Keycloak authorization, like all other resources
Can be extended later to support more accurate resource-based access as opposed to “all resources“ access
Cons
Multiple ways for sidecars to get discovery information when “Dynamic Routing“ is enabled.
Multiple modules for dynamic routing resolution increase the overall complexity of the system.
Dynamic routing can potentially introduce additional latency due to the extra steps required to resolve discovery info. The usage of the cache should mitigate the issue.
mgr-discovery becomes a critical component. if it fails, it could disrupt the entire routing and bootstrapping mechanism. HA and redundancy should be used to mitigate this.
Option 3.2
Description
This option is similar to option 3.1 but removes the Sidecar from the routing process. The mod-scheduler uses mgr-discovery to resolve the actual route based on a provided URL, module name, and tenant ID.
Timer Flow:
mod-scheduler prepares a request and puts an impersonated token for the system user into the x-okapi-token request header. The request is then sent directly to mgr-discovery.
mgr-discovery receives the request and checks its cache for the corresponding route information. If not found or expired, it proceeds to resolve the route.
mgr-discovery calls mgr-Tenant-Entitlements to resolve the actual module's version.
mgr-discovery calls mgr-Applications to retrieve the actual discovery information that points to the Sidecar (Module A).
mgr-scheduler routes the request directly to the Sidecar (Module A).
Sidecar (Module A) authorizes the request to Module A:
get a token from a request header.
parse the token.
call Keycloak to evaluate permissions. Since the system user has access to all resources, authorization will be successful.
Sidecar (Module A) forwards the request to Module A for processing.
Key Functions
Central Routing: mgr-scheduler acts as the central routing component for all timer requests, eliminating the need for routing logic in the Sidecars.
Routes Resolution and Cache with Expiration Policy: Similar to option 3.1, mgr-discovery maintains a cache of routes with an expiration policy to ensure up-to-date routing information.
Direct Integration with mod-scheduler: mod-scheduler integrates directly with mgr-discovery for all timer requests, simplifying the request flow.
Lightweight and Simple: The mgr-discovery module does not use any persistent storage and integrates only with mgr-tenant-entitlements and mgr-applications.
Pros
No changes in the Sidecar implementation.
Centralizes routing decisions in mgr-discovery, potentially making it easier to manage and update routing logic.
Reduces the number of hops in the request flow, potentially improving performance.
Maintains the benefits of dynamic routing and caching from option 3.1.
Cons
mod-scheduler becomes aware of mgr-* components, which breaks the separation of concerns and potentially introduces tighter coupling between these components.
Increases complexity in mod-scheduler as it now needs to handle route resolution logic.
Could potentially increase the load on mod-scheduler for frequent route resolutions.
Deviates from the standard pattern where modules interact with the system through their Sidecars.
Option 3.3
Description
This option modifies the behavior of the Sidecar to read all timer/all routes during the bootstrap phase, eliminating the need for dynamic routing resolution during runtime. The mgr-discovery component still plays a role in providing bootstrap information, including timer routes.
Timer Flow
During the bootstrap phase, the Sidecar reads all timer routes from mgr-discovery, which retrieves this information from mgr-Tenant-Entitlements and mgr-Applications.
mod-scheduler prepares a request and puts an impersonated token for the system user into the x-okapi-token request header. The request is then sent to its Sidecar.
The Sidecar (mod-scheduler) already has the routing information for timer requests and directly forwards the request to the appropriate Sidecar (Module A).
Sidecar (Module A) authorizes the request to Module A:
get a token from a request header.
parse the token.
call Keycloak to evaluate permissions. Since the system user has access to all resources, authorization will be successful.
Sidecar (Module A) forwards the request to Module A for processing.
Key Functions
Bootstrap Timer Route Loading: During the bootstrap phase, the Sidecar loads all timer routes from mgr-discovery, which compiles this information from mgr-tenant-entitlements and mgr-applications.
Static Routing: The Sidecar uses the pre-loaded routing information for timer requests, eliminating the need for dynamic routing resolution during runtime.
Lightweight mgr-discovery: The mgr-discovery module primarily serves bootstrap information and doesn't need to handle runtime route resolution requests.
Pros:
Simplifies runtime routing by eliminating the need for dynamic resolution.
Potentially improves performance by reducing the number of interactions required for each timer request.
Reduces load on mgr-discovery during normal operation.
Cons:
Lacks a clear mechanism for updating routing information when changes occur (e.g., during the entitlements process for different tenants and applications).
May lead to outdated routing information if changes occur after the bootstrap phase.
Increases memory usage in Sidecars as they need to store all timer routes.
Less flexible than dynamic routing options when dealing with frequent changes or a large number of routes.
Gap: The main gap in this option is the lack of a clear mechanism to notify the Sidecar about routing changes made by the mgr-tenant-entitlements module during the entitlements process for different tenants and applications. This is represented by the bold red arrow between mgr-tenant-entitlement and Sidecar in the diagram.
Potential solutions to this gap could include:
Implementing a notification system where mgr-tenant-entitlements pushes updates to affected Sidecars. (Kafka?)
Having Sidecars periodically poll for updates to their routing information.
Resolving this gap is crucial for ensuring that the Sidecars always have up-to-date routing information, especially in a dynamic multi-tenant environment where entitlements may change frequently.