Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note
titleThis Page is a Work In Progress


Overview

It seems that the current auth flow consumes more time than we had originally hoped/expected.  This investigation is to better understand where the bottlenecks are and to evaluate potential optimizations.

Baseline Measurements

Before evaluating possible performance improvements, I wanted to get a better understanding of where the pain points are.  These measurements will also serve as a baseline for comparison purposes as various enhancements are trialed.

Test Environment

  • AWS ECS
  • Private - nobody else using this except me
  • NGINX reverse proxy running in each container
  • Single instance of each module
  • Goldenrod platform-complete
  • Standard sample/reference data loaded

Methodology

  • Make a check-out-by-barcode request - add a query argument to help easily identify the request in logs
  • Make a check-in-by-barcode request (same loan that was just created) - also add a query arg here to help identify the request in logs
  • Analyze logs
    • Find the checkout/in request using the testId query argument, from here we will get the requestId and find all related/internal requests
      • From this log entry we obtain the total request time from okapi's perspective, including all required internal calls to auth, other modules, etc. 
      • In order to isolate the amount of time spent on network vs time spent in mod-authtoken, we dig into the mod-authtoken reverse proxy log.
      • Correlate each auth request with a log entry in the mod-authtoken nginx transaction log. 
        • requestId isn't explicitly logged, but the okapi token is, which contains the requestId
        • The transaction time logged here constitutes the time spent in the module
        • Derive network time for the auth call from the total request time and the auth module time.
    • Aggregate numbers and report a summary for each check-in/out call.

Results

test01

  • 10 interleaved calls to check-out-by-barcode and check-in-by-barcode 

check-out-by-barcode:

Code Block
POST circulation/check-out-by-barcode?testId=co_test02_1

# of auth calls:                      47
Total request time:                   1055.05 ms
Total time on calls to mod-authtoken: 975.63 ms (92.47%)
Total auth module time:               843.00 ms (86.41%)
Total autn network time:              132.63 ms (13.59%)

POST circulation/check-out-by-barcode?testId=co_test02_2

# of auth calls:                      33
Total request time:                   504.08 ms
Total time on calls to mod-authtoken: 241.96 ms (48.00%)
Total auth module time:               191.98 ms (79.34%)
Total autn network time:              49.98 ms (20.66%)

POST circulation/check-out-by-barcode?testId=co_test02_3

# of auth calls:                      33
Total request time:                   471.79 ms
Total time on calls to mod-authtoken: 264.62 ms (56.09%)
Total auth module time:               227.00 ms (85.78%)
Total autn network time:              37.62 ms (14.22%)

POST circulation/check-out-by-barcode?testId=co_test02_4

# of auth calls:                      34
Total request time:                   482.83 ms
Total time on calls to mod-authtoken: 231.83 ms (48.01%)
Total auth module time:               198.00 ms (85.41%)
Total autn network time:              33.83 ms (14.59%)

POST circulation/check-out-by-barcode?testId=co_test02_5

# of auth calls:                      34
Total request time:                   415.46 ms
Total time on calls to mod-authtoken: 211.03 ms (50.79%)
Total auth module time:               163.00 ms (77.24%)
Total autn network time:              48.03 ms (22.76%)

POST circulation/check-out-by-barcode?testId=co_test02_6

# of auth calls:                      33
Total request time:                   445.91 ms
Total time on calls to mod-authtoken: 207.60 ms (46.56%)
Total auth module time:               171.00 ms (82.37%)
Total autn network time:              36.60 ms (17.63%)

POST circulation/check-out-by-barcode?testId=co_test02_7

# of auth calls:                      33
Total request time:                   384.93 ms
Total time on calls to mod-authtoken: 198.59 ms (51.59%)
Total auth module time:               159.00 ms (80.06%)
Total autn network time:              39.59 ms (19.94%)

POST circulation/check-out-by-barcode?testId=co_test02_8

# of auth calls:                      34
Total request time:                   421.00 ms
Total time on calls to mod-authtoken: 219.96 ms (52.25%)
Total auth module time:               182.84 ms (83.12%)
Total autn network time:              37.12 ms (16.88%)

POST circulation/check-out-by-barcode?testId=co_test02_9

# of auth calls:                      33
Total request time:                   391.64 ms
Total time on calls to mod-authtoken: 199.07 ms (50.83%)
Total auth module time:               160.00 ms (80.37%)
Total autn network time:              39.07 ms (19.63%)

POST circulation/check-out-by-barcode?testId=co_test02_10

# of auth calls:                      33
Total request time:                   419.94 ms
Total time on calls to mod-authtoken: 203.11 ms (48.37%)
Total auth module time:               164.00 ms (80.74%)
Total autn network time:              39.11 ms (19.26%)

check-in-by-barcode

Code Block
POST circulation/check-in-by-barcode?testId=ci_test02_1

# of auth calls:                      22
Total request time:                   423.86 ms
Total time on calls to mod-authtoken: 198.28 ms (46.78%)
Total auth module time:               172.00 ms (86.75%)
Total autn network time:              26.28 ms (13.25%)

POST circulation/check-in-by-barcode?testId=ci_test02_2

# of auth calls:                      21
Total request time:                   310.25 ms
Total time on calls to mod-authtoken: 137.93 ms (44.46%)
Total auth module time:               114.00 ms (82.65%)
Total autn network time:              23.93 ms (17.35%)

POST circulation/check-in-by-barcode?testId=ci_test02_3

# of auth calls:                      21
Total request time:                   330.54 ms
Total time on calls to mod-authtoken: 138.82 ms (42.00%)
Total auth module time:               115.00 ms (82.84%)
Total autn network time:              23.82 ms (17.16%)

POST circulation/check-in-by-barcode?testId=ci_test02_4

# of auth calls:                      21
Total request time:                   301.42 ms
Total time on calls to mod-authtoken: 133.59 ms (44.32%)
Total auth module time:               111.00 ms (83.09%)
Total autn network time:              22.59 ms (16.91%)

POST circulation/check-in-by-barcode?testId=ci_test02_5

# of auth calls:                      21
Total request time:                   309.43 ms
Total time on calls to mod-authtoken: 126.37 ms (40.84%)
Total auth module time:               103.00 ms (81.51%)
Total autn network time:              23.37 ms (18.49%)

POST circulation/check-in-by-barcode?testId=ci_test02_6

# of auth calls:                      21
Total request time:                   317.40 ms
Total time on calls to mod-authtoken: 127.88 ms (40.29%)
Total auth module time:               100.00 ms (78.20%)
Total autn network time:              27.88 ms (21.80%)

POST circulation/check-in-by-barcode?testId=ci_test02_7

# of auth calls:                      21
Total request time:                   365.12 ms
Total time on calls to mod-authtoken: 136.78 ms (37.46%)
Total auth module time:               110.00 ms (80.42%)
Total autn network time:              26.78 ms (19.58%)

POST circulation/check-in-by-barcode?testId=ci_test02_8

# of auth calls:                      21
Total request time:                   305.61 ms
Total time on calls to mod-authtoken: 130.92 ms (42.84%)
Total auth module time:               104.00 ms (79.44%)
Total autn network time:              26.92 ms (20.56%)

POST circulation/check-in-by-barcode?testId=ci_test02_9

# of auth calls:                      21
Total request time:                   323.88 ms
Total time on calls to mod-authtoken: 131.24 ms (40.52%)
Total auth module time:               113.00 ms (86.10%)
Total autn network time:              18.24 ms (13.90%)

POST circulation/check-in-by-barcode?testId=ci_test02_10

# of auth calls:                      21
Total request time:                   291.32 ms
Total time on calls to mod-authtoken: 117.92 ms (40.48%)
Total auth module time:               96.00 ms (81.41%)
Total autn network time:              21.92 ms (18.59%)

Observations

  • You can see that some caching is being done in the check-out-by-barcode flow.  This is evident by the number of auth calls.  The first has 47, then subsequent are much lower 33/34
    • I'm not yet sure why it's not always 33 or 34.  
    • A similar pattern can be seen in the check-in-by-barcode flow, though much less so, presumably because the requests are interleaved.  The checkout call happens first, values are cached, so when the check-in call is made, the cached values are used.
  • Network traffic consistently takes up a fair amount of the time spent on auth (~15-20% on average)
  • A large portion of the request time is spent on auth (~40-50% on average)
  • TBD

Experiments

Description of various experiments and PoCs conducted.

Caching module-to-module tokens

In theory these don't really change very frequently, so caching them makes sense. 

Considerations

  • Currently the requestId is included in these tokens, so if we want to pursue this optimization, it will be at the cost of that.  I think it's only there for convenience - it's unclear if anyone is actually using this information or not.
  • The cache should be shared / synchronized between nodes in the OKAPI cluster
  • Is it necessary to provide a mechanism for manually purging and/or pruning the cache?
  • Must these tokens be tenant-specific, or can the same token be used by multiple tenants?

Challenges

  • Figuring out where to put this in OKAPI is a little tricky, but clearly doing it in OKAPI would give us the maximum benefit as it would mean skipping the call to mod-authtoken altogether, saving network time as well.
    • For my PoC I settled on checking for a cached token when determining the set of API calls needed, and populating the cache when handling the auth response.  ##TODO:  provide reference to PoC branch

Theoretical Benefits

  • Performance gains would not be seen immediately, as the cache would need to warm up before we could benefit from saving on calls to mod-authtoken
  • Once the cache is warmed, the time savings here would potentially be enormous.  
  • Example: 
    • Current (from baseline measurements):

      Code Block
      POST circulation/check-out-by-barcode?testId=co_test02_7
      
      # of auth calls:                      33
      Total request time:                   384.93 ms
      Total time on calls to mod-authtoken: 198.59 ms (51.59%)
      Total auth module time:               159.00 ms (80.06%)
      Total autn network time:              39.59 ms (19.94%)
      
      # Stats for initial auth call (used to calculate theoretical below)
      Auth Request Time: 5.92 ms
      Auth Module Time:  5.00 ms (84.46%)
      Auth Network Time: 0.92 ms (15.54%)


    • Theoretical

      Code Block
      POST circulation/check-out-by-barcode?testId=co_test02_7
      
      # of auth calls:                      1
      Total request time:                   192.26 ms
      Total time on calls to mod-authtoken: 5.92 ms (3.08%)
      Total auth module time:               5.00 ms (84.46%)
      Total autn network time:              0.92 ms (15.54%)

      The 1 auth call here is the original users call for check-out-by-barcode.  The rest of the auth calls are module-to-module.  Those tokens could be obtained from cache, preventing the need to call mod-authtoken at all. 

Actual Benefits

A PoC for this is still a work in progress.  No actual results have been obtained yet.

...