This Page is a Work In Progress
Overview
It seems that the current auth flow consumes more time than we had originally hoped/expected. This investigation is to better understand where the bottlenecks are and to evaluate potential optimizations.
Baseline Measurements
Before evaluating possible performance improvements, I wanted to get a better understanding of where the pain points are. These measurements will also serve as a baseline for comparison purposes as various enhancements are trialed.
Test Environment
- AWS ECS
- Private - nobody else using this except me
- NGINX reverse proxy running in each container
- Single instance of each module
- Goldenrod platform-complete
- Standard sample/reference data loaded
Methodology
- Make a check-out-by-barcode request - add a query argument to help easily identify the request in logs
- Make a check-in-by-barcode request (same loan that was just created) - also add a query arg here to help identify the request in logs
- Analyze logs
- Find the checkout/in request using the testId query argument, from here we will get the requestId and find all related/internal requests
- From this log entry we obtain the total request time from okapi's perspective, including all required internal calls to auth, other modules, etc.
- In order to isolate the amount of time spent on network vs time spent in mod-authtoken, we dig into the mod-authtoken reverse proxy log.
- Correlate each auth request with a log entry in the mod-authtoken nginx transaction log.
- requestId isn't explicitly logged, but the okapi token is, which contains the requestId
- The transaction time logged here constitutes the time spent in the module
- Derive network time for the auth call from the total request time and the auth module time.
- Aggregate numbers and report a summary for each check-in/out call.
- Find the checkout/in request using the testId query argument, from here we will get the requestId and find all related/internal requests
Results
test01
- 10 interleaved calls to check-out-by-barcode and check-in-by-barcode
check-out-by-barcode:
POST circulation/check-out-by-barcode?testId=co_test02_1 # of auth calls: 47 Total request time: 1055.05 ms Total time on calls to mod-authtoken: 975.63 ms (92.47%) Total auth module time: 843.00 ms (86.41%) Total autn network time: 132.63 ms (13.59%) POST circulation/check-out-by-barcode?testId=co_test02_2 # of auth calls: 33 Total request time: 504.08 ms Total time on calls to mod-authtoken: 241.96 ms (48.00%) Total auth module time: 191.98 ms (79.34%) Total autn network time: 49.98 ms (20.66%) POST circulation/check-out-by-barcode?testId=co_test02_3 # of auth calls: 33 Total request time: 471.79 ms Total time on calls to mod-authtoken: 264.62 ms (56.09%) Total auth module time: 227.00 ms (85.78%) Total autn network time: 37.62 ms (14.22%) POST circulation/check-out-by-barcode?testId=co_test02_4 # of auth calls: 34 Total request time: 482.83 ms Total time on calls to mod-authtoken: 231.83 ms (48.01%) Total auth module time: 198.00 ms (85.41%) Total autn network time: 33.83 ms (14.59%) POST circulation/check-out-by-barcode?testId=co_test02_5 # of auth calls: 34 Total request time: 415.46 ms Total time on calls to mod-authtoken: 211.03 ms (50.79%) Total auth module time: 163.00 ms (77.24%) Total autn network time: 48.03 ms (22.76%) POST circulation/check-out-by-barcode?testId=co_test02_6 # of auth calls: 33 Total request time: 445.91 ms Total time on calls to mod-authtoken: 207.60 ms (46.56%) Total auth module time: 171.00 ms (82.37%) Total autn network time: 36.60 ms (17.63%) POST circulation/check-out-by-barcode?testId=co_test02_7 # of auth calls: 33 Total request time: 384.93 ms Total time on calls to mod-authtoken: 198.59 ms (51.59%) Total auth module time: 159.00 ms (80.06%) Total autn network time: 39.59 ms (19.94%) POST circulation/check-out-by-barcode?testId=co_test02_8 # of auth calls: 34 Total request time: 421.00 ms Total time on calls to mod-authtoken: 219.96 ms (52.25%) Total auth module time: 182.84 ms (83.12%) Total autn network time: 37.12 ms (16.88%) POST circulation/check-out-by-barcode?testId=co_test02_9 # of auth calls: 33 Total request time: 391.64 ms Total time on calls to mod-authtoken: 199.07 ms (50.83%) Total auth module time: 160.00 ms (80.37%) Total autn network time: 39.07 ms (19.63%) POST circulation/check-out-by-barcode?testId=co_test02_10 # of auth calls: 33 Total request time: 419.94 ms Total time on calls to mod-authtoken: 203.11 ms (48.37%) Total auth module time: 164.00 ms (80.74%) Total autn network time: 39.11 ms (19.26%)
check-in-by-barcode
POST circulation/check-in-by-barcode?testId=ci_test02_1 # of auth calls: 22 Total request time: 423.86 ms Total time on calls to mod-authtoken: 198.28 ms (46.78%) Total auth module time: 172.00 ms (86.75%) Total autn network time: 26.28 ms (13.25%) POST circulation/check-in-by-barcode?testId=ci_test02_2 # of auth calls: 21 Total request time: 310.25 ms Total time on calls to mod-authtoken: 137.93 ms (44.46%) Total auth module time: 114.00 ms (82.65%) Total autn network time: 23.93 ms (17.35%) POST circulation/check-in-by-barcode?testId=ci_test02_3 # of auth calls: 21 Total request time: 330.54 ms Total time on calls to mod-authtoken: 138.82 ms (42.00%) Total auth module time: 115.00 ms (82.84%) Total autn network time: 23.82 ms (17.16%) POST circulation/check-in-by-barcode?testId=ci_test02_4 # of auth calls: 21 Total request time: 301.42 ms Total time on calls to mod-authtoken: 133.59 ms (44.32%) Total auth module time: 111.00 ms (83.09%) Total autn network time: 22.59 ms (16.91%) POST circulation/check-in-by-barcode?testId=ci_test02_5 # of auth calls: 21 Total request time: 309.43 ms Total time on calls to mod-authtoken: 126.37 ms (40.84%) Total auth module time: 103.00 ms (81.51%) Total autn network time: 23.37 ms (18.49%) POST circulation/check-in-by-barcode?testId=ci_test02_6 # of auth calls: 21 Total request time: 317.40 ms Total time on calls to mod-authtoken: 127.88 ms (40.29%) Total auth module time: 100.00 ms (78.20%) Total autn network time: 27.88 ms (21.80%) POST circulation/check-in-by-barcode?testId=ci_test02_7 # of auth calls: 21 Total request time: 365.12 ms Total time on calls to mod-authtoken: 136.78 ms (37.46%) Total auth module time: 110.00 ms (80.42%) Total autn network time: 26.78 ms (19.58%) POST circulation/check-in-by-barcode?testId=ci_test02_8 # of auth calls: 21 Total request time: 305.61 ms Total time on calls to mod-authtoken: 130.92 ms (42.84%) Total auth module time: 104.00 ms (79.44%) Total autn network time: 26.92 ms (20.56%) POST circulation/check-in-by-barcode?testId=ci_test02_9 # of auth calls: 21 Total request time: 323.88 ms Total time on calls to mod-authtoken: 131.24 ms (40.52%) Total auth module time: 113.00 ms (86.10%) Total autn network time: 18.24 ms (13.90%) POST circulation/check-in-by-barcode?testId=ci_test02_10 # of auth calls: 21 Total request time: 291.32 ms Total time on calls to mod-authtoken: 117.92 ms (40.48%) Total auth module time: 96.00 ms (81.41%) Total autn network time: 21.92 ms (18.59%)
Observations
- You can see that some caching is being done in the check-out-by-barcode flow. This is evident by the number of auth calls. The first has 47, then subsequent are much lower 33/34
- I'm not yet sure why it's not always 33 or 34.
- A similar pattern can be seen in the check-in-by-barcode flow, though much less so, presumably because the requests are interleaved. The checkout call happens first, values are cached, so when the check-in call is made, the cached values are used.
- Network traffic consistently takes up a fair amount of the time spent on auth (~15-20% on average)
- A large portion of the request time is spent on auth (~40-50% on average)
- TBD
Experiments
Description of various experiments and PoCs conducted.
Caching module-to-module tokens
In theory these don't really change very frequently, so caching them makes sense.
Considerations
- Currently the requestId is included in these tokens, so if we want to pursue this optimization, it will be at the cost of that. I think it's only there for convenience - it's unclear if anyone is actually using this information or not.
- The cache should be shared / synchronized between nodes in the OKAPI cluster
- Is it necessary to provide a mechanism for manually purging and/or pruning the cache?
- Must these tokens be tenant-specific, or can the same token be used by multiple tenants?
Challenges
- Figuring out where to put this in OKAPI is a little tricky, but clearly doing it in OKAPI would give us the maximum benefit as it would mean skipping the call to mod-authtoken altogether, saving network time as well.
- For my PoC I settled on checking for a cached token when determining the set of API calls needed, and populating the cache when handling the auth response. ##TODO: provide reference to PoC branch
Theoretical Benefits
- Performance gains would not be seen immediately, as the cache would need to warm up before we could benefit from saving on calls to mod-authtoken
- Once the cache is warmed, the time savings here would potentially be enormous.
- Example:
Current (from baseline measurements):
POST circulation/check-out-by-barcode?testId=co_test02_7 # of auth calls: 33 Total request time: 384.93 ms Total time on calls to mod-authtoken: 198.59 ms (51.59%) Total auth module time: 159.00 ms (80.06%) Total autn network time: 39.59 ms (19.94%) # Stats for initial auth call (used to calculate theoretical below) Auth Request Time: 5.92 ms Auth Module Time: 5.00 ms (84.46%) Auth Network Time: 0.92 ms (15.54%)
Theoretical
POST circulation/check-out-by-barcode?testId=co_test02_7 # of auth calls: 1 Total request time: 192.26 ms Total time on calls to mod-authtoken: 5.92 ms (3.08%) Total auth module time: 5.00 ms (84.46%) Total autn network time: 0.92 ms (15.54%)
The 1 auth call here is the original users call for check-out-by-barcode. The rest of the auth calls are module-to-module. Those tokens could be obtained from cache, preventing the need to call mod-authtoken at all.
Actual Benefits
A PoC for this is still a work in progress. No actual results have been obtained yet.