Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, we have seen that sidecars are sometimes unable to retrieve the parameters from the AWS parameter store to retrieve necessary information, like passwords for system users, which causes the problem that it can’t work properly.

Keycloak Resource and Sidecar Issues

Keycloak struggles to store all resources required for authorization when we have more than 10 realms. Each realm contains approximately 1,500 resources, and the default cache size of 10,000 was insufficient. To address this, we increased the cache size to 80,000 items.

Additionally, we observed excessive overhead caused by requests to Keycloak using incorrect credentials. This occurs because sidecars fail to retrieve passwords from the AWS Parameter Store.

Problem Statement

During our investigation into the sidecar performance issue related to authorization in Keycloak, we observed that some sidecars cannot authorize with Keycloak to retrieve the system user token. Upon further analysis, we discovered that these sidecars fail to retrieve passwords for system users from the AWS Parameter Store because they exceed the allowed rate limit, resulting in a "Rate Exceeded" exception. 

...

because mod-scheduler’s sidecar gets x-okapi-token per timer request

...

Keycloak Resource and Sidecar Issues

Keycloak struggles to store all resources required for authorization when we have more than 10 realms. Each realm contains approximately 1,500 resources, and the default cache size of 10,000 was insufficient. To address this, we increased the cache size to 80,000 items.

...

Additionally, we observed excessive overhead caused by requests to Keycloak using incorrect credentials. This occurs because sidecars fail to retrieve passwords from the AWS Parameter Store.Fixes Implemented:

  1. SSM Issues:

...

  • Problem: Sidecars are unable to retrieve passwords from SSM due to the rate limit of 40 requests per second. Each sidecar makes password requests every 300 seconds for all system users across all tenants, often at the same time.

  • Example Calculation:

    • Number of sidecars: ~70

    • Number of system users per sidecartenant: ~16

    • Number of tenants: ~15

    • Total requests: ~16,800 requests every 300 seconds

do not try to authorize on the Keyclock if sidecar does not have correct password from SSM

Conclusion

Refining the logic for system user password retrieval and ensuring proper validation will significantly reduce unnecessary calls to the AWS Parameter Store, improving overall efficiency.

...