/
Technical approach for Link handling when bib record is updated via data import

Technical approach for Link handling when bib record is updated via data import

Jira links

ARCH-22 - Getting issue details... STATUS

Overview

As it's pointed in the related document (Technical approach for update MARC Bib fields controlled by related Authority records - Technical Designs and Decisions - FOLIO Wiki) there are two phases for updating entities (1st - update entity itself and 2nd - update links).

In this scenario described update entities from bibs side as a main entity (and thus authorities are considered as linked ones - so called links).

Solution

Flyover diagram

Flow Diagram

Implementation details

New logic implementation on mod-srs and mod-entities-links to be outlined in detail according to scenarios.

mod-source-record-storage implied changes

  • Get from mod-entities-links to mod-srs records with current linkage for current instance_id (being updated) and authorities (payload with tags and mapped subfield)
  • Match retrieved controlled subfields and incoming (from DI) to form change-list of what values update and what keep pristine. Example: in case if $0 is the same and $a, $b and $c are to be updated, presuming that $a and $b - controlled and $c is uncontrolled subfields, needs check the linkage to ensure that $a and $b as ones are controlled will be kept the same and $c will be changed as one is uncontrolled. 
  • Perform update controlled and uncontrolled fields correspondingly as a result of previous matching step, based on $0 value change.
  • In case of different $0 value send request to mod-entities-links on remove linkage (or skip this step in case of $0 is the same)
  • Proceed with regular update flow for bib records by corresponding notification of mod-inventory.

mod-entities-links implied changes

Try to reuse already implemented PUT method for unlinking operation for particular instance as it is supposed to update records within mod-entities-links by instance_id. 

According to DB schema on mod-entities-links there are mappings between authority id and instance id by bib_record_field_id and linked_sub_fields

For unlink operation needs to implement SQL DELETE query by instance_id, authority_id and value of 0$ subfield. In case of missmatch:

  1. Collect all differences of $0 for each field
  2. Send aggregated REST API call on unlinking (removing) instance and authorities with $0 values of existing fields
  3. Continue with regular updates

Peculiarities of implementation

As long as update process is implemented within mod-di-process-core library needs to consider optimal approach of embedding update of controlled and uncontrolled subfields into codebase.

Impact on dependent libraries:

  1. Embed unlinking in new version of mod-di-process-core (implies more operational efforts with publishing new version of the lib, update dependencies into codebase)
  2. extend update logic with unlinking in mod-srs codebase (could also imply changes within mod-di-process-core in case of processing new operation on top of existing di process.)

Concerns

In case of big amount of controlled bib records update when it is needed to un- (re-)link instance to authority there is supposed to be excessive amount of I/O operation (REST API calls) between mod-srs and mod-links that should be optimized by handling requests in batch (e.g.: send array with size 1000 of instance and authority ids to unlink; get linked entities in bulk)

Delivery Plan

With the foregoing concerns on potential bottleneck and a room for performance optimization there is necessary to write performance tests to measure actual capabilities of processing and get indication if further performance improvements are relevant.

  1. Implement the solution with handling of controlled fields logic
  2. Write performance tests on scenarios of updating bibs with the same $0 value and with different one.
  3. Measure throughput and if needed plan performance optimizations with batch processing approach

Scope and LOE

Two modules affected:

  • mod-source-record-storage
    • implement updating controlled fields logic (2 sprints)
  • mod-entities-links
    • implement unlink logic (1 sprint)
  • Not yet defined scope (up to 2 sprints)

Rationale

As current performance could be enough it is better to proceed with steps 1 and 2 that implies simple implementation of the solution and performance tests coverage. Eventually solution will be delivered reasonably fast and there will be performance metrics outlined for decision making on any further optimization works (that could be not in priority as one's extra cost implied and in case of proper performance results of current approach).


Related content