Check-out lock feature

Summary of the issue

When a self-check machine is performing multiple check-outs for the same patron in a very short period of time, item limits are not working properly. Such check-outs are processed by multiple mod-circulation instances and multiple processes within each instance. This causes some of them to be unaware of the fact that others create loans at the same time as they are, possibly exceeding the item limit as a result of a race condition.

Proposed solution

Implement a "Check-out lock" feature that will synchronize creation of loans during check-outs to the same patron by creating a temporary patron-level lock. Each process performing a check-out in mod-circulation will need to acquire a lock before proceeding with loan creation. If a lock can't be acquired, the process can wait and retry or fail. Retry mechanism is part of the feature. Behavior of the feature (enabled/disabled, lock lifetime, retry parameters etc.) will be controlled from Settings.

Configuration, UI changes

Configuration is stored in environment variables of mod-circulation and passed to application through org.folio.Environment#getVariable 

Example of a environment variables:

CHECKOUT_LOCK_FEATURE_ENABLED=true
LOCK_TTL_MS=3000
RETRY_INTERVAL_MS="500|500|1000"

These particular settings mean: 
- The feature is enabled.
- A lock is released automatically after 3 seconds.
- If lock acquisition fails, it'll be retried after a 500ms wait, then after 500ms wait and once more after 1000ms more wait.

Front-end team will create a page to control these settings from the UI.

Changes in mod-circulation-storage

Database

A new table will be created:

CREATE TABLE schema_name.check_out_lock (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  user_id UUID UNIQUE NOT NULL,
  creation_date TIMESTAMP WITHOUT TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

A lock for patron {userId} is created by an INSERT query:

INSERT INTO schema_name.check_out_lock (user_id) VALUES ('{userId}');

user_id is a unique key, so it's impossible to create two locks for the same patron.

Lock is deleted by deleting a row by id.

Timestamp is needed to track the lifetime of a lock.

API

There will be CRUD endpoints added to mod-circulation-storage - to get (mainly for debugging purposes), to acquire and to release locks.

GET /check-out-lock-storage

Get a list of locks.

Query parameters:

  • userId - UUID - for filtering by user
  • offset - int
  • limit - int

Response body: array of lock objects (refer to "GET by ID" response body)

200 - A list of locks in the response body (can be empty)
422 - Invalid parameters

GET /check-out-lock-storage/{lockId}

Get a lock by ID.

Response JSON schema:

{
  "type": "object",
  "description": "Check-out lock",
  "properties": {
    "id": {
      "description": "ID of the lock",
      "$ref": "raml-util/schemas/uuid.schema"
    },
    "userId": {
      "description": "ID of the patron the lock is created for",
      "$ref": "raml-util/schemas/uuid.schema"
    },
    "creationDate": {
      "description": "Date and time when lock was created",
      "type": "string",
      "format": "date-time"
    }
}

Example of a successful 200 response:

{
  "id": "0bab56e5-1ab6-4ac2-afdf-8b2df0434379",
  "userId": "77477611-ab44-4082-a0d8-42f7acdfde11",
  "creationDate":"2018-01-31T21:21:02Z"
}

Response statuses:
200 - Lock found for patron {userId}, lock object is returned in the response body
404 - Lock not found for patron {userId}

POST /check-out-lock-storage

Body:

{
  "type": "object",
  "description": "Check-out lock",
  "properties": {
    "userId": {
      "description": "ID of the patron the lock is created for",
      "$ref": "raml-util/schemas/uuid.schema"
    },
 	"ttlMs": {
      "description": "Time to live for lock object",
      "type": "integer"
    },
}

Example of a body:

{
  "userId": "77477611-ab44-4082-a0d8-42f7acdfde11"
}

Response schema and example: refer to "GET by ID" response body.

Behavior:

  1. Delete a lock for userId if it's outdated. Maximum lifetime of a lock is passed as parameter in the body.
  2. Try to create a lock by inserting ({userId}, CURRENT_TIMESTAMP()) row.
    1. If successful, respond with 201 and a lock object JSON in the response body.
    2. If failed, respond with 503 with an appropriate error message.

Response statuses:
201 - Lock is created for patron {userId}
503 - Failed to create lock for patron {userId}

DELETE /check-out-lock-storage/{lockId}
Behavior:

  1. Delete a lock by {lockId} if it's outdated. Maximum lifetime of a lock is stored in mod-configuration. Existence of outdated locks shouldn't affect this endpoint's response status. For a client, outdated lock should be equivalent to its absence. This means that if a lock {lockId} existed and was outdated, response status should be 404 (as if it didn't exist).
  2. Delete a lock by {lockId}.
    1. If the lock {lockId} was actually deleted, respond with 204.
    2. If the lock {lockId} didn't exist, respond with 404 with an appropriate error message.

Response statuses:
204 - Lock {lockId} is deleted
404 - Lock {lockId} is not found

Changes in mod-circulation

Data caching for checking loan limits

When executing in CheckOutByBarcodeResource#

.thenComposeAsync(validators::refuseWhenItemLimitIsReached) 

the received Loan policy information must be cached in the context of the request. The assumption is that the Loan policy will not change during the processing of the current Check-out request.

Feature is disabled

When the "Check-out lock" feature is not enabled, there should be no changes in how mod-circulation processes check-outs (except an additional call to mod-configuration to check if it's enabled). 

Feature is enabled

During a check-out, before making any irreversible changes mod-circulation needs to acquire a lock for the patron by calling POST /check-out-lock-storage. This should be done in the CheckOutByBarcodeResource class before this line which updates the request queue:

.thenComposeAsync(r -> r.after(requestQueueUpdate::onCheckOut))

A) If this attempt fails, the retry mechanism should be started (parameters for the retry mechanism are stored in mod-configuration). If a lock wasn't acquired even with the retry mechanism, 422 Error should be returned with an appropriate message. No permanent changes or signals to other modules suggesting a successful check-out should be made, which means:

  • A loan shouldn't be created
  • An item shouldn't be updated with the new status
  • Request queue shouldn't be updated
  • Check-out session record shouldn't be created
  • No patron notices should be scheduled or sent
  • No events about a successfully made check-out should be sent to pub-sub

B) If a lock was acquired, the Loan policy should be rechecked using the previously cached data about loan policy and newly requested data on the current number of loans. A failed check means that a new loan cannot be created (same steps as in point A above). A successful check means that check-out should proceed with loan creation as usual.

Note that the acquired lock should be released anyway after processing by calling DELETE /check-out-lock-storage/{lockId}. This should be done regardless the Loan limits recheck was successful or not:

Regardless of the response status of the DELETE call, check-out process should continue as usual.

Performance considerations

The critical block of operations where race conditions should be avoided includes loan limits validation, update of the item and creation of the loan; other operations, e.g. notifications, could be done without locking. Performance impact on overall process should be negligible as overhead introduced by lock mechanism only includes two insert/delete operations and the critical block performs under 500ms according to performance test results for checkout.