Improve checkout performance by caching data

Description

Overview:
Cornell reports that check-ins and checkouts range from 1 second to 5 seconds (and sometimes up to 11 seconds). Missouri State reports the same rate. We need to improve the processing time for check-ins and checkouts.

has created a proposal at https://folio-org.atlassian.net/wiki/x/DgJU. After reviewing the proposal, the Capacity Planning Team has determined that we should proceed with the caching approach. Marc is in the process of creating a document outlining the technical aspects for the devs.

Steps:

Ask assigned team to come up with approach within 2 weeks.
Try out this caching approach on one record type, choosing the one with the biggest impact.
After we are satisfied with the process, implement caching for as many other record types as we are able during the release. Need to prioritize the rest of the record types so that we get the heavy hitters first.
Have the PTF team analyze the results of caching work completed.
Discuss impact of caching with Resource Access SIG (i.e. if cached record is more than X minutes old, refresh it). We are waiting until we know the impact of caching on the response time so that we are able to present the process with as much information as possible.
Determine next steps based on new PTF team analysis.

The "is defined by" stories for this feature should be worked on in this order...

Recommended approach to take..

cache expiration of 5 seconds for all record types
maximum cache size of 1000 records (this is pure speculation, as we don't know what impact the caching will have on memory usage)

Priority

Labels

NFRperformance

Fix versions

None

Development Team

None

Assignee

Holly Mistlebauer

Solution Architect

None

Parent

UXPROD-746 Performance

Parent Field Value

None

Parent Status

None

Attachments

Checklist

hide

TestRail: Results

Activity

Show:

Holly MistlebauerNovember 9, 2021 at 5:57 PM

It was decided that caching data would not give us level of performance improvement we need.

Julian LadischOctober 5, 2021 at 6:17 PM

If optimistic locking is enabled for a table getting an outdated record and using it for PUT will result in an optimistic locking failure that will persist when reloading the record until the cache expiration time has been reached.

This can be avoided by invalidating the cache for that record when doing a PUT.

Holly MistlebauerSeptember 29, 2021 at 5:47 PM
Edited

and : Hi! I have created the stories for this feature. Should I assign this to Vega? Thanks...
cc:

Marc JohnsonSeptember 29, 2021 at 11:30 AM

I think I've answered the current questions asked and provided my thoughts on implementation. Please let me know if you need anything else at the moment.

I am assuming that we can only cache the "Fetch" "Intents", so "Update item status" is out.

Yes, state changes cannot be avoided (at least not with the current process design).

Two of the "Fetches" have had an improvement made already ("Fetch automated blocks" and "Fetch item barcode" and one has a separate issue for the Vega team to address ("Fetch manual patron blocks"). Marc Johnson: Should we include any of these 4 in the group of 5 we start with?

I think we should exclude any of the operations that we have decided to dedicate separate work from the caching for the moment, in order to understand the impact of those improvements separately (ish, depending upon frequency of performance testing) from the caching changes.

My Proposed Ordering

Given that we aren't going to work with the RA SIG or other stakeholders to understand the tolerances they might accept. And that the response times for most of the record fetches are of a similar magnitude.

I think that it makes sense to start with the ones (that I think are) less likely to change and / or the impact of changes will likely be low.

The policies are in a slightly strange place in this list, I've put them a little higher than the potential negative impact might suggest, because we might want to get to some of that feedback sooner rather than later.

Here is my proposed ordering (my reasoning in brackets):

tenant locale (singular, is common to all check outs, should change very rarely)
loan type (probably small set, likely common to some check outs, low impact if inconsistent)
patron group (probably small set, likely common to some check outs, low impact if inconsistent)
material type (unsure of set size, likely common to some check outs, low impact if inconsistent)
location (unsure of set size, likely common to some check outs, low impact if inconsistent)
service point (unsure of set size, likely common to some check outs, low impact if inconsistent)
loan policy (small set, likely common to many check outs, possible high impact if inconsistent)
lost item policy (small set, likely common to many check outs, possible high impact if inconsistent)
overdue fine policy (small set, likely common to many check outs, possible high impact if inconsistent)
institution (unsure of set size, likely common to some check outs, low impact if inconsistent)
campus (unsure of set size, likely common to some check outs, low impact if inconsistent)
library (unsure of set size, likely common to some check outs, low impact if inconsistent)
circulation rules (already cached, we may want to replace this with a similar cache to what we implement in other places)
instance (large set, unlikely to be common to many check outs)
holdings record (large set, unlikely to be common to many check outs)
user (large set, unlikely to be common to many check outs)
item (large set, not common to any successful check outs within the time frame)

Cache Policies

I suggest we start with

a cache expiration of 5 seconds for all record types
a maximum cache size of 1000 records (this is pure speculation, as we don't know what impact the caching will have on memory usage)

Both of these should be runtime configurable so the PTF team (and other operational folks) can tweak them.

Marc JohnsonSeptember 28, 2021 at 4:24 PM
Edited

Which of the items listed are run for each and every checkout?

Most of them will be fetched for every request, these are the ones I'm confident of:

item
holdings record
instance
location
library
campus
institution
material type
loans
requests
user
user group
patron blocks (both manual and automatic)
service point
loan policy
circulation rules (although there is already caching here)
tenant locale

Loans, requests and items are poor candidates for caching (though not for other forms of derived data, this is why my preference was for persistent derived data) as I imagine check outs for the same item in a short time frame are rare. I don't know what the impact of title level requests will be on this area.

I cannot answer that authoritatively without much more analysis of all of the code paths in the system.

Of those, which are relatively small tables that can be loaded into memory easily. Circ rules can be reused with each and every scan and don't change often and could be a good candidate. Same for fine policies.

Can you help me understand why you are asking this question?

The approach that we've chosen (on demand, partial caching) means that we likely won't be loading the entire set of records for any record type into memory and if we do, it will be one at a time.

Ex: Barcode file is probably too large and unique for every scan.

I believe the unique barcode changes have been aborted for 2021 R2 and maybe 2021 R3 due to some organisations not being ready for this change.

Won't Do

Details
Reporter
Holly Mistlebauer
PO Rank
0
Front End Estimate
Out of scope
Front-End Confidence factor
Low
Back End Estimate
Jumbo: > 45 days
TestRail: Cases
Open TestRail: Cases
TestRail: Runs
Open TestRail: Runs

Created September 21, 2021 at 2:58 PM

Updated January 4, 2022 at 9:50 AM

Resolved November 9, 2021 at 5:57 PM

TestRail: Cases

TestRail: Runs

Improve checkout performance by caching data

Description

Priority

Labels

Fix versions

Development Team

Assignee

Solution Architect

Parent

Parent Field Value

Parent Status

Attachments

Linked issues

is defined by

Checklist

TestRail: Results

Activity

Holly MistlebauerNovember 9, 2021 at 5:57 PM

Julian LadischOctober 5, 2021 at 6:17 PM

Holly MistlebauerSeptember 29, 2021 at 5:47 PMEdited

Marc JohnsonSeptember 29, 2021 at 11:30 AM

Marc JohnsonSeptember 28, 2021 at 4:24 PMEdited

DetailsReporterHolly MistlebauerHolly MistlebauerPO Rank0Front End EstimateOut of scopeFront-End Confidence factorLowBack End EstimateJumbo: > 45 daysTestRail: CasesOpen TestRail: CasesTestRail: RunsOpen TestRail: Runs

Details

Reporter

PO Rank

Front End Estimate

Front-End Confidence factor

Back End Estimate

TestRail: Cases

TestRail: Runs

Holly MistlebauerSeptember 29, 2021 at 5:47 PM
Edited

Marc JohnsonSeptember 28, 2021 at 4:24 PM
Edited

Details
Reporter
Holly Mistlebauer
PO Rank
0
Front End Estimate
Out of scope
Front-End Confidence factor
Low
Back End Estimate
Jumbo: > 45 days
TestRail: Cases
Open TestRail: Cases
TestRail: Runs
Open TestRail: Runs