Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

The goal of this document is This document aims to summarise the outcome of the recent performance testing conducted by the PTF team and provide some suggestions as to how we might improve the situationperformance of checking out an item under load.

Context

History

Development of modules within the circulation domain started early in FOLIO's overall development, meaning they are some of the oldest modules and integrate heavily with other older modules.

...

A checkout (including the staff member scanning the item barcode) must complete within 1 second (from the documented /wiki/spaces/DQA/pages/2658550). It is stated that this includes the time for a person to scan the item barcode (presumably of the item).

For the purposes of this analysis I shall assume the following (neither of which are likely true in practice):

...

The performance requirements do not provide any guidance on what what the conditions (load parameters or resource configuration) this expectation should hold for.

Thus for the purposes of this analysis, the expectation is that:

the check out API must respond within 1 second under load from 8 concurrent requests (with no tolerance for outliers that exceed this limit)

Solution Constraints

Beyond the general constraints on architectural decisions.

  • No changes to the circulation API (the interface must remain the same)client interface of the circulation APIs
  • Only existing infrastructure can be used (I'm including Kafka in this, even though it isn't official yet)

...

Whilst we have overall performance data for (a mimic an approximation of) the whole check out process, we only have a single detailed sample of the downstream requests from the check out API. Also, the response times of the constituent parts do not tell the whole story and cannot be summed to give the overall response time.HoweverThat sample is not representative of the range of response times likely present in a whole performance test run.

Thus, this analysis has to assume that the sample is representative whilst also interpreting it skeptically.

We also have no data on the do not know:

  • why the response times of the constituent parts do not equate to the overall response time
  • what amount of time

...

  • Okapi takes to process requests / responses
  • what amount of time mod-circulation

...

  • takes to use this information to make decisions e.g. to apply the circulation rules

This factors mean it is challenging to draw reliable and specific conclusions about those the requests involved, meaning that most of the analysis will be broad and general.

What takes up the time?

StepTime Take
Generating a downstream token (assumed to be once per incoming request)133 ms (99 + 6 + 16 + 12)
Checking request token (for each downstream request)12ms (average)
Downstream request50ms (average)

There are 27 downstream requests triggered by mod-circulation during the sample check out.

Once we deduct the initial overhead (133ms) , to meet the 1 second expectation there is a budget of 29ms that leaves us with an approximate budget of 32ms per request (867 ms / 30) including the checking of the token27).

At the moment, the average request takes 62ms , which (including proxying overhead). This is more than double the budget we have available.

Whilst there are some outliers that push up this number, I think this indicates the degree of challenge we have with the current approach.

...

Improve the performance of individual downstream requests

Analyse and try to improve the performance of each downstream request

Characteristics

...

Characteristics

  • Scope for improvement is limited as many of these requests are individually relatively fast
  • Improvements are brittle and can be easily undone by changes to downstream modules (and it may take a while to become aware of degradation)
  • Limited by the constraints of the downstream modules (e.g. the data is currently stored as JSONB)
  • May involve changes in multiple modules
  • Retains the same amount of downstream requests
  • Retains the same overhead from Okapi proxying

Make downstream requests concurrently

Make some of the downstream requests in mod-circulation 

...

For example, once the item is received, the locations, loan types and material types can be fetched concurrently.

Characteristics

  • Only involves changes to mod-circulation
  • Increases the complexity of the code in mod-circulation
  • Not all requests can be made concurrently (some are based upon prior requests or decisions that cannot be made up front)
  • Is likely limited by how well other modules / database can handle concurrent requests
  • Retains the same overall load on the system as before (although it may be compressed in time)
  • Retains the same amount of downstream requests
  • Retains the same overhead from Okapi proxying

...

Introduces context-specific APIs that are intended for specific use. At most, this can only be applied to the requests made to the same module.

It may not make sense to combine all of the record types from a single module. For example,  does it make sense to have an API that fetches existing open loans and loan policies together?

We are already introducing a new API in mod-inventory-storage in this manner to improve the pre-checks made by the check out UI.

Characteristics

  • Reduces the amount of individual downstream requests (and hence the Okapi proxying overhead)
  • Requires at least one downstream requests request per destination module
  • Requires at least one database query per downstream module
  • Might reduce Might reduce the response time off the downstream request (compared to the combination of )
  • Might reduce the load on downstream modules (depending upon how the combined request is handled, it is possible the load increases)
  • Reduction in downstream requests is limited to number of record types within a single module
  • Increases the amount of APIs to maintain (what I call the surface area of the module)
  • Increases the coupling between modules (by introducing the clients context into the other module)
  • Increases the coupling between the record types involved (e.g. it's harder to move record types to other modules when they are included in APIs together, changes to them ripple across APIs)

...

The biggest challenge with this option is the community's tolerance to using potentially stale data for making decisions.

This suggests processing the messages and using a database from mod-circulation rather than mod-circulation-storage to avoid the overhead of needing to request the copied data from a downstream module.

Characteristics

  • Requires no downstream requests for fetching data during check out process
  • Increases the potential for stale data to be used for decisions
  • Is contrary to constraints that may still be present in FOLIO
  • Introduces complexity of processing messages and persistent storage into mod-circulation
  • Introduces a dependency on a database from mod-circulation
  • Introduces a dependency on messages produced by other modulesRequires no downstream requests for fetching data
  • State changes still require a downstream request (and the requisite proxying overhead)

Variations

Store the copied data in mod-circulation-storage

...

Combine the business logic and storage modules together

Characteristics

  • Removes all downstream for record types within the circulation domain e.g. loans, requests, loan policies etc (include state changes e.g. creating a loan, fulfilling a request)
  • Removes the distinction between business logic and storage representations of those records types
  • Allows for state changes within the circulation domain to be done within a database transaction
  • Is contrary to constraints that may still be present in FOLIO
  • Storage modules have been used to workaround cyclic dependencies constraints in Okapi, removing them might involve involve changing other modules to avoid this in other ways

Appendices

Definitions

PhraseDefinition
Downstream requestA request made by a module (via Okapi) in order to fulfil the original incoming request e.g. mod-circulation makes a request to mod-users to fetch patron information
Response timeThe time taken from the client making the request to receiving a response

Requests made during a typical check out

...