Platform, DevOps and Release Management (UXPROD-1814)

[UXPROD-3058] Optimistic Locking Created: 04/May/21  Updated: 14/Feb/23

Status: In Progress
Project: UX Product
Components: None
Affects versions: None
Fix versions: None
Parent: Platform, DevOps and Release Management

Type: Umbrella Priority: TBD
Reporter: Holly Mistlebauer Assignee: Jakub Skoczen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Defines
is defined by UXPROD-2994 Optimistic Locking: coordinate rollou... Open
is defined by UXPROD-3161 Orders - Implementing Optimistic Locking Open
is defined by UXPROD-3163 Part 2 - Finance - Implementing Optim... Open
is defined by UXPROD-3165 Organizations - Implementing Optimist... Open
is defined by UXPROD-3089 Inventory. Implementing Optimistic Lo... Closed
is defined by UXPROD-1752 Prevent update conflicts (via optimis... Closed
is defined by UXPROD-2796 Prevent update conflicts when doing m... Closed
is defined by UXPROD-2797 Prevent update conflicts (1 user and ... Closed
is defined by UXPROD-2798 Prevent update conflicts (two automat... Closed
is defined by UXPROD-3164 Invoices - Implementing Optimistic Lo... Draft
is defined by UXPROD-3534 Users - Implementing Optimistic Locking Draft
Gantt End to Start
has to be done before UXPROD-3700 UI-controlled pessimistic locking Open
Relates
relates to MODINVOICE-297 Spike: Optimistic Locking for Acquisi... Open
relates to MODINVUP-10 Update with necessary Optimistic Lock... Closed
relates to UIIN-1245 Implement optimistic locking in Inven... Closed
relates to UXPROD-3666 Improve support for parallel processing In Refinement
relates to UXPROD-3173 Folijet support work for Inventory Op... Closed
Epic Link: Platform, DevOps and Release Management
Development Team: Core: Platform
Kiwi Planning Points (DO NOT CHANGE): 149
PO Rank: 0

 Description   

Bringing together all optimistic locking features and combining their points...

Total points = 149

NOTE: Optimistic locking is the solution described in this feature. Libraries ranked this based on the idea of preventing update conflicts, not necessarily based on the specific solution of optimistic locking.

Problem statement

In FOLIO, most storage modules follow the "last writer wins" strategy for handling record updates. From the UI perspective this may lead to a situation when a stale record (older version of a give record) previously loaded into the UI may override a more recent version on the server. Hence relevant updates may get lost in the process and the user is not made aware of what has happened.

Scope: The scope of this issue is to create platform support for optimistic locking which modules can make use of on a case-by-case basis (opt-in). Focus of this feature is on simple "detection" and "prevention" (identifying when a collision has occurred and preventing it). Additional tools and mechanisms for handling collisions when they occur (e.g. diffs, merges etc.)). There are 3 phases, two of which are in scope for this feature:

  1. Detect collisions but do not prevent them. Just log in the system log that a mid-air collision has occurred. There is no instant benefit from this behaviour — the platform remains susceptible to collisions. But, once detection is deployed, we can review the logs to evaluate how often collisions occur and which APIs are at risk.
  2. Prevent an update when a collision gets detected. This builds on detection and additionally prevents the update from taking place. This is a “breaking” change from the API point of view: clients (end-users or batch processes alike) will start seeing an error returned (409 Conflict) when their update collides with another update. The immediate benefit is that we “protect” the system from collisions but we also create a fairly terrible user-experience and probably “break” a lot of batch processes that right now happily update records because FOLIO is so forgiving. This will be implemented as an opt-in feature so functional apps can implement when ready.
  3. (Out of Scope): Tools. Built tools for handling the “409 Conflict” errors. It could be a simple “resubmit my changes anyway” button in the UI that lets the user to force their changes (risk for messing up is with the user) and a way to “retry” for batch processes. It could also be something fancy when end-users can review the conflict and choose which changes to keep and which to drop, etc.

Proposed solution

Handling of updates in FOLIO should rely on more explicit semantics, both in the storage (backend) APIs and the way it is communicated to the user through the UI.

From the storage and API perspective, optimistic locking is the proposed strategy to handle conflicts:

  • optimistic locking – each record state is marked with a "version number" (or a timestamp, hash, etc) which is returned to the client along with the record. The client includes the version number during the update and the server checks that the version hasn't changed before it writes the record back. If the record is dirty (version doesn't match) the update is aborted. In practice for a REST API (typical FOLIO uses case) this means using ETag with a combination of If-Match conditional request and 412 (precondition failed) and 409 (conflict) error codes.

In general, optimistic locking is used when the risk of collisions (updates to the same record) is low and when the lock granularity is high ((ie duration of any given update is short).

Use cases collected from community add others that seem likely

  • Not frequent: 2 users editing the same record at the same time
    • User A and User B editing the same record at the same time (not frequent) – users, orders, instances, holdings, items, requests
    • User A editing an item and User B creating a request for that item
    • User A editing and item and User B putting that item on course reserve at the same time
    • User A editing an invoice and User B trying to approve the same invoice at the same time
    • User A editing an item and User B deleting the item before User A's edits are saved (see UIIN-730 Blocked )
    • User A editing a request and User B cancelling the request before User A's edits are saved (see UIREQ-344 Closed )
    • When attempting to update holdings and their items concurrently the holdings updates will ever so often interfere with the item updates, effectively nullifying the latter (see MODINVSTOR-516 Closed ). This particular item is being addressed via RMB-388 Closed .
    • User A and User B generating a new number using the number generator for call number or accession number (number generator runs separate queries for selecting and incrementing the number (GBV); not relevant if FOLIO combines select and increment into one query) (not a challenge in FOLIO because the functionality does not exist)
  • More frequent: 1 user and system trying to act on the same record, either individual records or batch
    • User A editing a user and system batch process is updating lots of users
    • User A editing an instance/holding/item and data import updating the same record (consider the DI redesign that is taking place now)
    • User A editing an item and checkout trying to update the item status
    • User A editing an item and bulk renewal trying to update the item
    • User A editing a budget and system applying a transaction to that budget at the same time
    • User A editing an instance/holdings/item after data import ran in Preview mode but before the data import changes were committed
    • User A editing a request while the request is being expired (request expiration date or hold shelf expiration date) - rare
  • Two automated processes acting on the same record
    • Checkout happening and updating status on an item record at the same time as import updating the item
    • Data import happening at 2 libraries within the same tenant, affecting the same record (e.g. 5 Colleges processing new cataloging records)

User impact

This approach will not prevent collisions, but it would notify the user when they happen and offer them a choice. Something like, "Sorry AgentB has already updated the record and your working copy might not be up-to-date? Would you like to:" (a) Update anyway (b) Reload.

  • OL means that in certain situations the update operation will fail which needs to be communicated to the user, The UI should then allow the user to choose the next step, e.g by refreshing the state of the record in the browser and re-applying original changes.

This situation can happen when multiple data imports are happening at the same time (or data import and a user acting on the same record at the same time) and can affect many records at the same time. Cleanup can then be very time-consuming and confusing.

 


Generated at Fri Feb 09 00:29:00 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.