EUREKA-755 - Design/PoC solution to the "too many roles" problem

EUREKA-755 - Design/PoC solution to the "too many roles" problem

Overview

https://folio-org.atlassian.net/browse/EUREKA-755

Objective: Find a solution to the “too many roles” problem, and flesh out stories for it’s implementation.

Background

We’ve discovered that the size of access tokens increases with the number of roles assigned to a user. At some point, the x-okapi-token header becomes so large that requests will be rejected (e.g. at the load balancer, reverse proxies, etc.)

Problem Statement

We need to find an approach to reduce the size of access tokens in the case where numerous roles are assigned to a user.

Scope

  • Formally enumerate the various approaches, identifying pros/cons and relative effort of each.

  • Creation of one or more proof of concepts to help evaluate feasibility, and help identify pitfalls is optional, but encouraged.

  • The following should be considered when evaluating each approach:

    • feasibility

    • amount of effort

    • performance

    • maintainability

    • impact of system operators

    • architectural fit (i.e. is the solution consistent with existing folio architecture?)

    • resource consumption (e.g. will the solution require significantly more/less resources than other solutions?)

    • migration path

Proposed Solutions

The following options were the product of some brainstorming and have been discussed at a high level. As such the confidence factor for some of these is lower than for others. Additional investigation, prototyping, etc. is required before implementation stories can be created.

Provide Guidance

Document best practices for role creation, including the use of shorter name, and tips to help find the appropriate/ideal granularity for roles.

  • Candidate for short term mitigation

Lightweight Access Tokens

Configure the necessary clients in keycloak to use lightweight access tokens, which do not include a list of roles assigned to the user.

  • Requires changes to keycloak configuration

  • Requires changes to the sidecar - call a different authz/RPT endpoint

  • Candidate for long term solution

  • Questions:

    • What adjustments are required on the keycloak side to support this (lightweight/opaque access tokens)

    • Is this only applicable to the <tenant>_application client, or are other clients involved here?

  • Performance concerns - keycloak has to do more work

    • Caching can likely help - both on the keycloak and sidecar sides.

  • Increased load on keycloak -> $$?

Synthetic Role Per User

Behind the scenes, create a single (keycloak) role for each user which contains all capabilities assigned to the user via their (folio) roles. Here a given user might be assigned numerous roles in folio, but from keycloak’s perspective they are only assigned to one role. Therefore, the access tokens will only have this one synthetic role listed, avoiding the token scaling problem.

  • Candidate for long term solution

  • Scalability concerns? As the number of users in keycloak increases, the number of roles does as well (linear). If you have thousands of users in a given realm, you will also have thousands of roles for that realm.

  • Updating a role in folio may result in many roles needing to be updated on the keycloak side

    • If in Folio, you have a role which is assigned to many users, making an adjustment to that role would require many roles to be adjusted on the keycloak side. What happens if one of the many updates fails? This could get messy.

Map Folio Roles to Keycloak Roles w/ Short Auto-generated names

Here, roles will continue to be named however users/admins see fit on the Folio side, but when creating the corresponding role in keycloak a short, system-generated name would be used. Since the role name length factors into the access token size, using short names should help significantly.

  • Doesn’t actually solve the underlying issue, just mitigates it. Maybe we’d be able to support 40-50 roles assigned to a user, but now the limitation becomes something like 90-100.

Enforce Limits on Role Name Length

Presently, there are no length restrictions on the length of role names. Since the role name length factors into the access token size, using short names should help significantly. Enforcing a maximum would allow us to mitigate the problem as well as make it easier to perform some validation (see next option).

  • Doesn’t actually solve the underlying issue, just mitigates it.

  • Candidate for short term mitigation

Problem Detection During Role Assignment

Perform some validation during assignment ensuring that a user isn't assigned too many roles. Calculating the size of the access token can be tricky, the number of roles and their quantity are only part of the equation. Having a maximum role name length (see previous option) would probably help make this calculation/estimation easier, but there will still be some guess work.

  • Some combination of the options above

  • Candidate for short term mitigation

Conclusion

The “Lightweight Access Tokens” approach is the leading contender so far. A POC is needed to determine feasibility, identify potential pitfalls, determine amount of work, etc.

Open Questions

  • What does the migration path look like?

Additional Information