EUREKA-778 - Support multiple versions of a module in a given deployment
Overview
https://folio-org.atlassian.net/browse/EUREKA-778
Objective: If two tenants have enabled different versions of the same module, Kong and sidecars need to proxy requests based on tenant.
Background
Okapi supports the ability to run multiple versions of a given module within the same Folio instance, and independently upgrade tenants. Eureka’s original implementation also supported this by registering tenant-specific routes in kong. This had the downside of not scaling well, and led to problems with folio instances having many tenants (e.g. large consortia, etc). More recently, Eureka switched to using tenant-agnostic routes in kong. This solved the scalability problem and simplified a lot of code, but at the cost not supporting multiple versions of a given module. See the Additional Information section below for links to JIRAs and other details.
Problem Statement
We need to find an approach to specifying kong routes which meets both requirements:
Scalability (Many tenants in the same Folio instance)
Support for running multiple versions of a given module within the same Folio instance.
Scope
Solutions which do not meet both requirements above will not be considered.
The following should be considered when evaluating each approach:
feasibility
amount of effort
performance
maintainability
impact of system operators
architectural fit (i.e. is the solution consistent with existing folio architecture?)
resource consumption (e.g. will the solution require significantly more/less resources than other solutions?)
Proposed Solutions
The following options were the product of some brainstorming and have been discussed at a high level. As such the confidence factor for some of these is lower than for others. Additional investigation, prototyping, etc. is required before implementation stories can be created.
Option #1 - Tenant-specific routes w/ feature toggle
Add a feature/behavior toggle in mgr-applications which controls whether routes in kong are created in a tenant-specific or tenant-agnostic way. The change to make kong routes tenant-agnostic was done intentionally for scalability reasons (to support cases where there are many 30+ tenants in a given deployment). One caveat is that you wouldn’t be able to change this configuration once your system is up and running. IOW switching back and forth will cause problems.
Risky - changing the value after the system is provisioned will make a mess
Route creation/removal logic needs to live in two places mgr-applications and mgr-tenants
Toggle needs to be consistently specified in multiple places - contributes to the risk
Not a feasible option for large/medium sized consortia due to scalability reasons
Option #2 - Version-specific routes w/ clients sending X-Okapi-Module-Id
Continue to create tenant-agnostic routes in Kong, but incorporate x-okapi-module-id into the expressions. Clients will be required to specify x-okapi-module-id.
Strikes a balance between too many and too few routes.
There are many clients out there… stripes is only one. Tools/scripts/edge modules, etc. will all need to be updated to support this approach
Option #3 - Version-specific routes w/ MTE updating expressions upon entitlement changes
Continue to create tenant-agnostic kong routes (version-specific). In mgr-tenant-entitlements, during entitlement, update the route expressions to include tenant information (e.g. tenantA is using version 1, tenantB is using version 2, etc. Kong looks at the x-okapi-tenant-id and uses that information to evaluate whether to route the request to version 1 or version 2, e.g. does the current tenant match any of the tenants listed in the version 1 expression? If yes, route to version 1. If not, continue. Does the current tenant match any of the tenants listed in the version 2 expression? etc.
Strikes a balance between too many and too few routes.
Changes are not required on the client side, as long as x-okapi-tenant is specified you should be all set.
/ Route expressions will be more complicated, but not by much. Similar to what they looked like previously when tenant-specific routes, only instead of exact matches, it’s a “if any” match.
Conclusion
Option #3 is the preferred approach but needs to be proven via PoC.
Open Questions
What does the migration path look like for Eureka-based systems which are already up and running? For reference, see https://folio-org.atlassian.net/wiki/spaces/FOLIJET/pages/705232945