Platform, DevOps and Release Management (UXPROD-1814)

[UXPROD-1752] Prevent update conflicts (via optimistic locking): platform support for detection Created: 27/May/19  Updated: 03/Jan/24  Resolved: 20/Apr/21

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: R1 2021
Parent: Platform, DevOps and Release Management

Type: New Feature Priority: P3
Reporter: Jakub Skoczen Assignee: Jakub Skoczen
Resolution: Done Votes: 0
Labels: Showstopper-Cornell, Support, platform-backlog, round_iv
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File screenshot-1.png     PNG File screenshot-2.png    
Issue links:
Blocks
blocks UXPROD-2796 Prevent update conflicts when doing m... Closed
blocks UXPROD-2797 Prevent update conflicts (1 user and ... Closed
blocks UXPROD-2798 Prevent update conflicts (two automat... Closed
is blocked by RMB-719 SPIKE: design protocol and implementa... Closed
Cloners
clones FOLIO-2028 SPIKE: how to handle update conflicts? Closed
Defines
defines UXPROD-3058 Optimistic Locking In Progress
is defined by MODINVSTOR-713 Enable support for optimistic locking... Closed
is defined by RMB-719 SPIKE: design protocol and implementa... Closed
is defined by RMB-727 Implement support for optimistic locking Closed
is defined by UIIN-1245 Implement optimistic locking in Inven... Closed
Relates
relates to RMB-688 PATCH to update only some JSONB prope... Open
relates to UXPROD-3048 Check that generated sequences cannot... Open
relates to DEBT-1 No optimistic locking/update conflict... Closed
relates to RMB-388 PostgresClient.getById with transacti... Closed
relates to UIREQ-344 Deleting already-deleted request caus... Closed
relates to UIIN-730 Error message when item has been dele... Blocked
relates to UXPROD-2994 Optimistic Locking: coordinate rollou... Open
relates to FOLIO-2027 Data Problems from Front-End Record C... Open
relates to MODINVSTOR-516 Cannot safely update holdings and ite... Closed
relates to MODINVSTOR-656 enable "detection-only" OL for instan... Closed
relates to MODINVOICE-233 Integration test check-invoice-and-in... Closed
Epic Link: Platform, DevOps and Release Management
Back End Estimate: XXL < 30 days
Development Team: Core: Platform
Rank: Chalmers (Impl Aut 2019): R1
Rank: Chicago (MVP Sum 2020): R1
Rank: Cornell (Full Sum 2021): R1
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R1
Rank: FLO (MVP Sum 2020): R1
Rank: GBV (MVP Sum 2020): R1
Rank: hbz (TBD): R1
Rank: Lehigh (MVP Summer 2020): R1
Rank: Leipzig (Full TBD): R1
Rank: Leipzig (ERM Aut 2019): R1
Rank: MO State (MVP June 2020): R1
Rank: TAMU (MVP Jan 2021): R1
Rank: U of AL (MVP Oct 2020): R1

 Description   

NOTE: Optimistic locking is the solution described in this feature. Libraries ranked this based on the idea of preventing update conflicts, not necessarily based on the specific solution of optimistic locking.

Problem statement

In FOLIO, most storage modules follow the "last writer wins" strategy for handling record updates. From the UI perspective this may lead to a situation when a stale record (older version of a give record) previously loaded into the UI may override a more recent version on the server. Hence relevant updates may get lost in the process and the user is not made aware of what has happened.

Scope: The scope of this issue is to create platform support for optimistic locking which modules can make use of on a case-by-case basis (opt-in). Focus of this feature is on simple "detection" and "prevention" (identifying when a collision has occurred and preventing it). Additional tools and mechanisms for handling collisions when they occur (e.g. diffs, merges etc.)). There are 3 phases, two of which are in scope for this feature:

  1. Detect collisions but do not prevent them. Just log in the system log that a mid-air collision has occurred. There is no instant benefit from this behaviour — the platform remains susceptible to collisions. But, once detection is deployed, we can review the logs to evaluate how often collisions occur and which APIs are at risk.
  2. Prevent an update when a collision gets detected. This builds on detection and additionally prevents the update from taking place. This is a “breaking” change from the API point of view: clients (end-users or batch processes alike) will start seeing an error returned (409 Conflict) when their update collides with another update. The immediate benefit is that we “protect” the system from collisions but we also create a fairly terrible user-experience and probably “break” a lot of batch processes that right now happily update records because FOLIO is so forgiving. This will be implemented as an opt-in feature so functional apps can implement when ready.
  3. (Out of Scope): Tools. Built tools for handling the “409 Conflict” errors. It could be a simple “resubmit my changes anyway” button in the UI that lets the user to force their changes (risk for messing up is with the user) and a way to “retry” for batch processes. It could also be something fancy when end-users can review the conflict and choose which changes to keep and which to drop, etc.

Proposed solution

Handling of updates in FOLIO should rely on more explicit semantics, both in the storage (backend) APIs and the way it is communicated to the user through the UI.

From the storage and API perspective, optimistic locking is the proposed strategy to handle conflicts:

  • optimistic locking – each record state is marked with a "version number" (or a timestamp, hash, etc) which is returned to the client along with the record. The client includes the version number during the update and the server checks that the version hasn't changed before it writes the record back. If the record is dirty (version doesn't match) the update is aborted. In practice for a REST API (typical FOLIO uses case) this means using ETag with a combination of If-Match conditional request and 412 (precondition failed) and 409 (conflict) error codes.

In general, optimistic locking is used when the risk of collisions (updates to the same record) is low and when the lock granularity is high ((ie duration of any given update is short).

Use cases collected from community add others that seem likely

  • Not frequent: 2 users editing the same record at the same time
    • User A and User B editing the same record at the same time (not frequent) – users, orders, instances, holdings, items, requests
    • User A editing an item and User B creating a request for that item
    • User A editing and item and User B putting that item on course reserve at the same time
    • User A editing an invoice and User B trying to approve the same invoice at the same time
    • User A editing an item and User B deleting the item before User A's edits are saved (see UIIN-730 Blocked )
    • User A editing a request and User B cancelling the request before User A's edits are saved (see UIREQ-344 Closed )
    • When attempting to update holdings and their items concurrently the holdings updates will ever so often interfere with the item updates, effectively nullifying the latter (see MODINVSTOR-516 Closed ). This particular item is being addressed via RMB-388 Closed .
    • User A and User B generating a new number using the number generator for call number or accession number (number generator runs separate queries for selecting and incrementing the number (GBV); not relevant if FOLIO combines select and increment into one query) (not a challenge in FOLIO because the functionality does not exist)
  • More frequent: 1 user and system trying to act on the same record, either individual records or batch
    • User A editing a user and system batch process is updating lots of users
    • User A editing an instance/holding/item and data import updating the same record (consider the DI redesign that is taking place now)
    • User A editing an item and checkout trying to update the item status
    • User A editing an item and bulk renewal trying to update the item
    • User A editing a budget and system applying a transaction to that budget at the same time
    • User A editing an instance/holdings/item after data import ran in Preview mode but before the data import changes were committed
    • User A editing a request while the request is being expired (request expiration date or hold shelf expiration date) - rare
  • Two automated processes acting on the same record
    • Checkout happening and updating status on an item record at the same time as import updating the item
    • Data import happening at 2 libraries within the same tenant, affecting the same record (e.g. 5 Colleges processing new cataloging records)

User impact

This approach will not prevent collisions, but it would notify the user when they happen and offer them a choice. Something like, "Sorry AgentB has already updated the record and your working copy might not be up-to-date? Would you like to:" (a) Update anyway (b) Reload.

  • OL means that in certain situations the update operation will fail which needs to be communicated to the user, The UI should then allow the user to choose the next step, e.g by refreshing the state of the record in the browser and re-applying original changes.

This situation can happen when multiple data imports are happening at the same time (or data import and a user acting on the same record at the same time) and can affect many records at the same time. Cleanup can then be very time-consuming and confusing.



 Comments   
Comment by Cate Boerema (Inactive) [ 29/Jul/19 ]

Jakub Skoczen I am assigning this to you as PO since it's Core Platform. Can you add the PO rank?

Comment by Theodor Tolstoy (One-Group.se) [ 15/Jan/20 ]

Should w e add the NFR tag to this?

Comment by Ann-Marie Breaux (Inactive) [ 09/Apr/20 ]

Cate Boerema Jakub Skoczen Theodor Tolstoy (One-Group.se)

Also added a note in the description that this may happen in cases of data import or batch editing, where large numbers of records may be affected at once. Based on convo at MM SIG today.

cc: lew235 Christie Thomas Charlotte Whitt

Comment by Erin Nettifee [ 11/May/20 ]

Asking same question as Theodor Tolstoy (One-Group.se) - is this NFR? It's shown up on the Round IV list, but this seems like tech debt and I'm not sure how we would rank this other than go live.

Comment by Holly Mistlebauer [ 17/Jun/20 ]

Cornell comment from Round IV Outliers spreadsheet: This outlier is a showstopper for Cornell

Comment by Holly Mistlebauer [ 04/Sep/20 ]

Cate Boerema: As we are re-planning what was formerly Q4 2020 (and is now simply Iris), it seems that we should revisit this feature. This is a go-live feature for every institution and is currently causing issues (see Support issue MODINVSTOR-516 Closed ).

Comment by Anya [ 10/Sep/20 ]

Cate Boerema is there an update on this ticket - I am being asked about this on many fronts

Comment by Cate Boerema (Inactive) [ 11/Sep/20 ]

Hi Anya. I think Jakub Skoczen would be better positioned to provide an update. I know this is being discussed regularly in TC and I think Core Platform is working on a design. Last I discussed with Jakub, we needed to (1) design and implement a platform solution (Core Platform team) and then (2) roll it out incrementally in FOLIO apps starting with Inventory (Core Functional etc). I have already created a draft story for Inventory here: UIIN-1245 Closed

Comment by Holly Mistlebauer [ 14/Sep/20 ]

Discussed at Capacity Planning Team meeting this morning...

Jakub reports that the Spike for optimistic blocking ( RMB-719 Closed ) is almost done. It should be available for the devs to start integrating in 2 weeks or so. We will start by integrating it into one app and see what happens. Won't be fully implemented everywhere in the Honeysuckle release but should be for Iris.

Comment by Anya [ 16/Sep/20 ]

There is a worry, that is being expressed to me, that the solution being proposed is to have better error messages... even though having better error messages are good - this doe not solve the issue of preventing or incorporating changes that are made when more than one user is editing a record.

Please help me understand the proposed solution(s) for this issue.

Comment by Marc Johnson [ 17/Sep/20 ]

Anya

the solution being proposed is to have better error messages

I'm not sure where folks have got their understanding of this from. In the current system, most of the time when two potentially conflicting changes are made, there wouldn't be an error at all.

Please help me understand the proposed solution(s) for this issue.

There are ongoing conversations about the specifics of the solution.

I'll try to share my understanding of the general approach being taken.

Let's say we have two users, Bob and Alice, editing the same instance record, Harry Potter and the Goblet of Fire.

They both started around the same time (and their starting points were the same).

They both make unrelated changes to the record. Bob saves the record first and then Alice.

In the current system, Alices changes are remembered, and Bob's are likely lost.

In the proposed solution, Bob's changes would be remembered, and Alice would receive an error that the record had changed since she started editing it.

Does that make sense?

Jakub Skoczen Does that match your understanding of the general approach?

Comment by Jakub Skoczen [ 17/Sep/20 ]

Anya Yes, it's correct that the changes proposed here would result a prevention of mid-air collision by rejecting updates. It does not address specific functionality for automatically incorporating changes to a single record made by multiple users.

Comment by Charlotte Whitt [ 17/Sep/20 ]

Marc Johnson and Jakub Skoczen - re:

In the proposed solution, Bob's changes would be remembered, and Alice would receive an error that the record had changed since she started editing it.

Then my question is: What happens to Alice's changes, e.g. if she did a lot of updates in quickMARC?
Can Alice return to her work in quickMARC, and all edits are preserved?

The preferred behavior will of course be, that Alice's edits has not been lost.

Another question, what if Alice is not a person, but e.g. the Data Import module, doing an update at the same time as a person (Bob)? What will happen in Data Import?

Comment by Marc Johnson [ 17/Sep/20 ]

Charlotte Whitt

Then my question is: What happens to Alice's changes, e.g. if she did a lot of updates in quickMARC?
Can Alice return to her work in quickMARC, and all edits are preserved?

That depends upon what the client, in your example the quickMARC UI, does. It could attempt to reapply Alice's changes on top of Bob's. However this assumes that it knows specifically what changes Alice made. I don't think the current UI tracks them sufficiently to do this easily. My interpretation of Jakub Skoczen's comment above:

It does not address specific functionality for automatically incorporating changes to a single record made by multiple users.

is that kind of recovery is out of scope of this feature.

Another question, what if Alice is not a person, but e.g. the Data Import module, doing an update at the same time as a person (Bob)? What will happen in Data Import?

Indeed, I was trying to keep the explanation simple by only considering people interacting with the system via the UI.

In your example, data import is merely a specialised client, it has to decide how it wants to react. It could attempt to reapply it's changes on top of the new update or it could abort that record and present an error to the person who imported the file.

Comment by Björn Muschall [ 17/Sep/20 ]

Could Alice be given a UI choice between canceling and applying her changes? Then she can talk to Bob on the phone and sort things out. Or is that unrealistic in practice? The technical implementation sounds feasible, I think, but I am not a developer. It would possibly extend the function with simple methods.

EDIT: I see it's already in the description, sorry.

Comment by Erin Nettifee [ 17/Sep/20 ]

Can Alice be warned when she opens the record that Bob is already viewing/editing it? Or is this scenario intended to understand what happens if they open it at the exact same time?

Comment by Julian Ladisch [ 17/Sep/20 ]

A warning does not prevent that they both try to save their changes.
Back-end processes and batch processes can also be one or both parties of the conflict.
Showing a warning if some other user has opened the record for edit (some kind of pessimistic locking) is useful but has other difficulties. It is out of scope for optimistic locking, please create a separate issue for this warning feature.

Comment by Ann-Marie Breaux (Inactive) [ 17/Sep/20 ]

Good point Julian Ladisch Data import may be trying to update an SRS MARC, Instance, Holdings, or Item at the same time that a user is manually updating it. If there's a person watching the import, they won't actually be on the same record as User A, so backend would need to provide some sort of message to the log that the record could not be updated because it was in use by another user or process.

Comment by Jenn Colt [ 17/Sep/20 ]

From my data import user perspective, it is not my expectation that this issue provide warnings about things being in use, I am happy with the initial scope described as a place to start. For new records in data import, this isn't an issue.

For updating records, I would expect data import to open the record at a given version(3) and then write the updated record. If someone had updated the record to version 4 between data import's read and data import's write then I would expect my data import change to be rejected with a reason in the log. I wonder if this model will hold in the bulk APIs?

If at the same time a user had the record open and I saved version 4 before they did, then I would expect the user save to fail. I'm not expecting warnings about things being in use from this.

We might want to give data import users the ability to override this behavior, but only with a lot of care.

I think data export will need to add version # to exports if we want to be able to use that as part of an ETL chain (for marc records perhaps the 005 is enough though, if DI allows that comparison).

Comment by Jakub Skoczen [ 18/Sep/20 ]

On the FOLIO API level there's little distinction between different types of clients: some may be connected to interactive sessions in the FOLIO UI (user editing a record) some may be connected to batch processes running in the background (and potentially using more specialised batch APIs).

I think a solution where the client is allowed to "override" collision detection is certainly possible but may need to be used with extreme care. It could quickly become a source of the problem we are trying to solve.

Comment by Holly Mistlebauer [ 21/Sep/20 ]

Update from Capacity Planning Team meeting...

  • Optimistic locking has been narrowed down to two possible approaches. Still dealing with...
    • Backward compatibility
    • Handling batch (long-running) processes
  • Jakub will present options to Product Owners for review and discussion.
  • After that will have community meeting/forum.
Comment by Jacquie Samples [ 23/Sep/20 ]

Why is this in the work-around list? This is fundamental to operations (locking records when in-process) and there is no work-around listed.

Comment by Julian Ladisch [ 23/Sep/20 ]

Jacquie Samples Which work-around list do you refer to, can you post a link to it?

Comment by Jacquie Samples [ 23/Sep/20 ]

Hi Julian Ladisch
it is on the "Round 4 and Potential workarounds" list linked here https://folio-org.atlassian.net/browse/UXPROD-492?filter=12622

Comment by Julian Ladisch [ 23/Sep/20 ]

Thanks.

The "Potential Workaround" field of this issue contains this text:

CPT: Only comes up during dedicated testing. Two people must perform changes to the same record at the same time.

From A-M: Except this is not true. It will come up more as libraries do more and more data export/import and batch updates.

GBV: GBV libraries have update conflicts when doing regular work. Locking was added to our current system (OCLC LBS) to fix this.

I agree with Jacquie that this doesn't mention any workaround. CPT says that they are not affected in production, the others say that they are affected in production.
I delete the text from the "Potential Workaround" field so that this issue no longer incorrectly shows up on the "Round 4 and Potential workarounds" list mentioned above. A copy of the text is in this comment and in the history.

Comment by Ann-Marie Breaux (Inactive) [ 01/Oct/20 ]

Meeting today: Jakub Skoczen, Julian Ladisch, Craig McNally, Ann-Marie Breaux, Charlotte Whitt, Dennis Bridges

Key points: (please add any I missed in the comments)

  • Intro PPT: https://docs.google.com/presentation/d/1X-2Yynn207dT4Shre0DjZLs19NMJB8aNIDUwD-Vszdw/edit?usp=sharing
  • Libraries who ranked this probably focused on the update conflicts portion of the Jira, and less on the solution (optimistic locking)
  • Probably less likely that 2 users are acting on the same record at the same time, more likely that a user and a system process are acting on the same record at the same time
  • Maybe implement as opt-in, so that if this breaks things, it doesn't break all apps at once. And apps can opt in when they are ready to deal with any additional errors or fallout.
  • Added some use cases to the description. A-M will ask the POs to add other use cases and to link any related, existing Jiras in their apps to this feature. Data import Preview and Rollback (not yet implemented) will increase chances of collisions.
  • A-M will talk with Data Import devs about what happens now, and with SMEs about what happens in their current systems
  • Meet again same time next week to discuss next steps
Comment by Ann-Marie Breaux (Inactive) [ 08/Oct/20 ]

Comments from the Data Import Subgroup meeting 7 Oct 2020:

  • Locking records when another user is editing them (https://folio-org.atlassian.net/browse/UXPROD-1752)
    • The key is data integrity
    • Catalogers are used to having this in their system, so it’s definitely needed
    • When export and then reimport (like authority processing), and then re-imported, are we checking to see if the instance/cat records has been updated since the export? A-M check with the developers
    • Also two processes acting on the same record (e.g. circ and data import updating a record at the same time)
    • When new things are received, instance, holdings, items all may be worked on a lot in that early life of the resource in the library
    • Thin thread: even if not handled gracefully by the UI at the start, needs to be handled well in the backend at least
    • How do their systems currently handle it?
      • 005 date is set at export; when importing, check and see if the date in the existing record is newer than the 005 date
      • The versioning approach that is being discussed would be similar
      • When a record is exported, do we need to send out as 005 or as some indication of the version
      • Chicago has an applet that knows about exports; pre-import they submit the list to the applet, and if any records have been updated since they export, they pull those records and reject them
Comment by Holly Mistlebauer [ 12/Nov/20 ]

HOLLY IS ADDING INFO FROM JAKUB THAT APPEARED IN SLACK...

Comment by Holly Mistlebauer [ 12/Nov/20 ]

I've requested (on behalf of Cornell) that a meeting be set up for interested library parties. It would be good for all of us to get on the same page. It appears that this JIRA issue ( UXPROD-1752 Closed ) is now for 'Part 1' only and new JIRAs will be created for 'Part 2' and 'Part 3'. 'Part 1' and opt-in for 'Part 2' are both being targeted for R1 2021.

Comment by Charlotte Whitt [ 13/Nov/20 ]

Hi Holly Mistlebauer - will you also invite to this meeting the most directly involved POs and SIG conveners?

CC: Jakub Skoczen lew235 Kristin Martin Martina.Schildt

Comment by Kristin Martin [ 16/Nov/20 ]

Thanks - Holly Mistlebauer are there other Jira numbers that now need to be ranked?

Comment by Anya [ 23/Nov/20 ]

Has this meeting been set up? This has come up in some support cases as well...

Comment by Ann-Marie Breaux (Inactive) [ 23/Nov/20 ]

Not that I know of, Anya

Comment by Holly Mistlebauer [ 30/Nov/20 ]

'Watchers': I will ask about the meeting and the other JIRAs at tomorrow's Cap Planning meeting.

Comment by Holly Mistlebauer [ 30/Nov/20 ]

Charlotte Whitt: The meeting will be open to anyone.

Comment by Holly Mistlebauer [ 01/Dec/20 ]

Hi Watchers! Jakub is scheduled to discuss optimistic locking at the FOLIO Implementation Group meeting on December 15 at 11:00 AM US Eastern Time. The Zoom URL is
https://zoom.us/j/244921097. We will record the meeting...
Thanks,
Holly

Comment by Holly Mistlebauer [ 09/Dec/20 ]

Jakub's attendance at the FOLIO Implementation Group meeting has been rescheduled for January 5 at 11:00 AM US Eastern Time. The Zoom URL is still https://zoom.us/j/244921097.

Comment by Anya [ 22/Mar/21 ]

Jakub Skoczen is there an update on this?

 

Comment by Julian Ladisch [ 23/Mar/21 ]

raml-module-builder (RMB) since version 32.0.0 ships with optimistic locking support ( RMB-727 Closed ).

Documentation: https://github.com/folio-org/raml-module-builder#optimistic-locking

RMB based modules need to explicitly enable optimistic locking to use it.

mod-inventory-storage since version 20.0.0 ( MODINVSTOR-656 Closed ) uses optimistic locking

  • for instance, holding, item only
  • using logOnConflict: a conflict is logged, but not rejected, so that the first write is lost because it gets overwritten by the second write

Iris ships with a mod-inventory-storage version >= 20.0.0.

See also UXPROD-2994 "OL: coordinate rollout of "failOnConflict" to select modules and APIs".

Comment by Jakub Skoczen [ 20/Apr/21 ]

Closing this as the RMB support for OL has been shipped in RMB 32. However, this functionality on it's own does not provide any conflict resolution facilities for individual APIs. The follow-up feature relevant to implementers is UXPROD-2994 Open .

Generated at Fri Feb 09 00:18:14 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.