SPIKE: File stuck in Data Import with Unauthorized error
Token Refresh Mechanism in UI
The token refresh mechanism behaves differently between the Q and R environments when dealing with user sessions. Here's an overview of how token rotation is handled from the UI side:
Q Environment
In Q, token rotation is tied to API requests. If a user session remains idle, the Access Token (AT) will not be automatically refreshed until the next API request is initiated. The following process occurs when an API request is made:
AT is Not Expired: The request proceeds without interruption.
AT is Expired, Refresh Token (RT) is Not Expired: Stripes makes a rotation request to the
/authn/refresh
endpoint and then passes the original request through.AT and RT are Expired: An error event is emitted, causing a redirection to
/logout
to prompt re-authentication.
R Environment
In R, the rotation process occurs automatically in a separate thread on a timer, ensuring that the AT is continually refreshed, even during inactivity. When an API request comes in, stripes handles it as follows:
Rotation Request In-Flight: If a rotation request is already in progress, it can be completed before the original request is passed through.
No Rotation Request In-Flight: The original request is passed through without delay.
This difference ensures that in R, even if the user is inactive, the session remains valid as long as the refresh process runs successfully.
Token Refresh Details
Each time
/auth/refresh
is called to refresh tokens, a new Access Token (AT) and Refresh Token (RT) are issued.If a user logs out, the Refresh Token in their cookies at that time is invalidated.
Additionally, if a new AT is obtained using a Refresh Token, the old Refresh Token is invalidated.
If a separate request (e.g., via Postman or backend) is made to
/auth/refresh
with a valid Refresh Token to obtain a new AT and RT pair, logging out the user will not affect this new pair, and it will remain valid until its TTL expires.
Observations from OKAPI Environments
folijet-dev Rancher Environment: With the configuration
TOKEN_EXPIRATION_SECONDS=accessToken:60,refreshToken:3600
,LEGACY_TOKEN_TENANTS=null
data imports that took longer than the access token's lifespan completed without errors. However, if the session was active in the browser, the following logs were observed:okapi logs (2024-10-24T14:33:16,851):
INFO ? 083332/authn REQ 10.0.59.111:39366 diku POST /authn/refresh mod-authtoken-2.16.0-SNAPSHOT.153 mod-login-7.12.0-SNAPSHOT.151
mod-authtoken logs:
14:33:16 [083332/authn] [diku] [] [] DEBUG FilterApi handleAuthorize path=/authn/refresh 14:33:16 [083332/authn] [diku] [] [mod-authtoken] DEBUG FilterApi Final authToken is eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJkaWt1X2FkbWluIiwidXNlcl9pZCI6IjhiMGUwODU4LTJlZDMtNGRhNi05MGNjLWM2ZTkyZjNjZmFlMCIsInR5cGUiOiJhY2Nlc3MiLCJleHAiOjE3Mjk3ODA0MDgsImlhdCI6MTcyOTc4MDM0OCwidGVuYW50IjoiZGlrdSJ9.fEGSicXidnLlu42siESQk6SBNd_qy-xy12nsbneIgpI 14:33:16 [083332/authn] [diku] [] [mod-authtoken] DEBUG FilterApi Validated token of type: access 14:33:16 [083332/authn] [diku] [] [mod-authtoken] DEBUG FilterApi payload{ "sub": "diku_admin", "user_id": "8b0e0858-2ed3-4da6-90cc-c6e92f3cfae0", "type": "access", "exp": 1729780408, "iat": 1729780348, "tenant": "diku" }
This indicates that the UI refreshes the token as soon as it expires, but the data import process continues successfully using the old token, even though it has not been refreshed
PTF OKAPI Environment: With the configuration
{"name": "TOKEN_EXPIRATION_SECONDS","value": "accessToken:3600"}
, the following imports were successfully completed:100k MARC BIB Update - PTF - Updates Success 6: 1 hour 2 minutes.
500k MARC BIB Create - PTF - Create 2: 3 hours 27 minutes.
500k MARC BIB Update - PTF - Updates Success 6: 5 hours 31 minutes.
These observations suggest that despite varying token expiration settings, the data import process remains resilient, completing even when the access token's TTL is exceeded.
Eureka Environment:
The issue was reproducible in the Eureka environment.
Testing on
qecp1
with an Access Token TTL of 3 minutes showed that records beginning import failed with a 401 Unauthorized status after this time. This indicates that the import process fails when the access token expires without being refreshed.Reproducible on
evr2
Possible solutions
Refreshing the client token by backend modules making the request
Set “
Refresh Token Max Reuse
” toN > 0
Right after the start of the import, a new AT+RT pair needs to be obtained to avoid situations where the RT is invalidated due to the user logging out or obtaining a new token. Additionally, if the value of thisRefresh Token Max Reuse
is strictly monitored to match the number of modules, it might work. However, if an unauthorized party accesses the key and requests a new pair of tokens, one of the modules could fail with an exception.Drawbacks: A module must store a token pair for a specific
jobExecutionId
during operation. After a reboot, the module receives the token pair from a Kafka message. However, if that token pair has already been used to request new tokens, and theRefresh Token Max Reuse
limit is reached, the module cannot obtain a new token pair. This creates a potential failure point, as the module won't be able to refresh tokens after a reboot, impacting its ability to continue the job.
Obtaining a #SYS Token that does not expire: (Refer to comment).
The ticket linked to this comment is currently in the OPEN state, so it's uncertain how well this approach works.
Drawbacks: The data import process relies on user permissions, which is why a client token is required.
Separate token storage (Redis or a database schema):
This solution allows tokens to be stored by userId and retrieved as needed.
Drawbacks: Security issues (Restricting access to storage). There's a risk of desynchronization between the storage and Keycloak. Additionally, maintaining the storage introduces extra overhead, and performance issues may arise.
Broadcast token distribution (Kafka, RabbitMQ, ActiveMQ):
Tokens are sent through messages, allowing synchronization across modules.
Drawbacks: It requires complex configuration and maintenance. Message delays or losses are also risky, leading to authorization issues.