[DRAFT]
Spike Overview
User Story: https://folio-org.atlassian.net/browse/EUREKA-211 Spike - Revisit routing of mod-scheduler requests
Objective: Revisit the decisions made in the wake of - EUREKA-89Getting issue details... STATUS
See spike findings here: https://folio-org.atlassian.net/wiki/x/BACNCg
The reason to revisit this is that Kong may not know how to route requests to these system interfaces in cases where multiple modules have timers that call the same system interface. It’s also not great that Kong is even aware of system interfaces, given that it only routes traffic from outside the system.
We may want to consider alternative solutions that avoid these problems. For example, we could make the mod-scheduler sidecar aware of all discovery and other relevant info or be able to retrieve it on an as-needed basis.
Problem Statement
Kong's inability to route requests correctly to system interfaces and its awareness of these interfaces are problematic. This routing confusion arises when multiple modules' timers call the same system interface, which is beyond Kong's intended functionality of managing external traffic.
Scope
This spike will explore alternative solutions to the current design to improve request routing for system interfaces. Potential approaches include enhancing the mod-scheduler sidecar to be aware of or retrieve necessary discovery and other relevant information dynamically. The goal is to devise a solution that bypasses the limitations faced by Kong in its current role.
Deliverables
The main idea is to work in detail on option 3, described in the EUREKA-89 Spike - Design solution for scheduled system calls.
There are three possible solutions here, but all of them originate from the same ideas: mgr-applications and mgr-tenant-entitlements contain all the information to identify a correct route that should be used for a particular “timer call” from the mod-scheduler. So, these “timer“ routes should not be configured in Kong, and all the discovery information could be retrieved dynamically or available when it is needed.
Option 3.1
In general, this option repeats the idea of security and principles described in option 1.2 of the spike https://folio-org.atlassian.net/wiki/x/BACNCg, but eliminates the Kong from the timer flows.
The following diagram displays the main components and actors involved in the flow.
Description
This option introduces a new component, mgr-discovery, that acts as a facade for the Sidecars and provides a single point of interaction between them and the mgr-* components. Sidecar will use mgr-discovery to read all bootstrap info related to the FOLIO module the Sidecar serves and dynamically retrieve discovery information that will be used for timer routings.
Timer Flow:
mod-scheduler prepares a request and puts impersonated token for the system user into
x-okapi-token
request header. Then the request goes from mod-scheduler to its sidecar.Because Sidecar (mod-scheduler) has dynamic routing enabled and the requested route is not known, it calls the mgr-discovery to resolve the discovery information.
mgr-discovery calls mgr-Tenant-Entitlements to resolve the actual module's version.
mgr-discovery calls mgr-Applications to retrieve the actual discovery information that points to the Sidecar (Module A).
Sidecar (mod-scheduler) dynamically routes the request to the Sidecar (Module A).
Sidecar (Module A) authorizes the request to Module A.
get token from request header
parse token
call Keycloak to evaluate permissions. Since the system user has access to all resources, authorization will be successful
Sidecar (Module A) forwards the request to Module A for processing.
Key Functions:
Bootstrap info reading:
When a request for bootstrap data is received, the module mgr-discovery performs calls to mgr-Tenant-Entitlements and mgr-Applications to collect and compile all the information needed for the Sidecar.
Routes Resolution and Cache with Expiration Policy:
Purpose: The module mgr-discovery maintains a cache of routes that includes information about various system/timer interfaces and their respective routes. This cache has an expiration policy to ensure that the information remains up-to-date and valid.
Functionality: When a request for a specific route (e.g.,
/timer-urlA
) is received, the module mgr-discovery checks its cache for the corresponding route. If the route information is available and valid, it proceeds to use it. If the information is expired or not found, the module queries mgr-Tenant-Entitlements and mgr-Applications to fetch and update the route information.
Lightweight and Simple: the mgr-discovery module does not use any persistent storage and integrates only with
mgr-Tenant-Entitlements
andmgr-Applications
Single Integration Point for Sidecars: Sidecars will interact with the mgr-discovery only, which will simplify the Sidecar implementation as it no longer has to integrate with different mgr-* modules.
Pros
Kong is excluded from the timer workflow. The system/timer interfaces are not configured in Kong and are not available from the outside.
The implementation is relatively simple. it will affect only the sidecar and a new mgr-discovery module
The Sidecar’s codebase will be simplified as it will interact with the mgr-discovery module only and the rest of the complex logic will be encapsulated in the mgr-discovery
No need to track the timer interface changes as the discovery info will be resolved dynamically and stored in the cache with a short retention time.
No need for the sidecars to get ALL discovery information, only what’s required, when it’s required.
Protects timer interfaces in a standard way via Keycloak authorization, like all other resources
Can be extended later to support more accurate resource-based access as opposed to “all resources“ access
Cons
Multiple ways for sidecars to get discovery information when “Dynamic Routing“ is enabled.
Multiple modules for dynamic routing resolution increase the overall complexity of the system.
Dynamic routing can potentially introduce additional latency due to the extra steps required to resolve discovery info. The usage of the cache should mitigate the issue.
mgr-discovery becomes a critical component. if it fails, it could disrupt the entire routing and bootstrapping mechanism. HA and redundancy should be used to mitigate this.