Technical approach for Link handling when bib record is updated via data import
Jira links
- ARCH-22Getting issue details... STATUS
Overview
As it's pointed in the related document (Technical approach for update MARC Bib fields controlled by related Authority records - Technical Designs and Decisions - FOLIO Wiki) there are two phases for updating entities (1st - update entity itself and 2nd - update links).
In this scenario described update entities from bibs side as a main entity (and thus authorities are considered as linked ones - so called links).
Solution
Flyover diagram
Flow Diagram
Implementation details
New logic implementation on mod-srs and mod-entities-links to be outlined in detail according to scenarios.
mod-source-record-storage implied changes
- Get from mod-entities-links to mod-srs records with current linkage for current instance_id (being updated) and authorities (payload with tags and mapped subfield)
- Match retrieved controlled subfields and incoming (from DI) to form change-list of what values update and what keep pristine. Example: in case if $0 is the same and $a, $b and $c are to be updated, presuming that $a and $b - controlled and $c is uncontrolled subfields, needs check the linkage to ensure that $a and $b as ones are controlled will be kept the same and $c will be changed as one is uncontrolled.
- Perform update controlled and uncontrolled fields correspondingly as a result of previous matching step, based on $0 value change.
- In case of different $0 value send request to mod-entities-links on remove linkage (or skip this step in case of $0 is the same)
- Proceed with regular update flow for bib records by corresponding notification of mod-inventory.
mod-entities-links implied changes
Try to reuse already implemented PUT method for unlinking operation for particular instance as it is supposed to update records within mod-entities-links by instance_id.
According to DB schema on mod-entities-links there are mappings between authority id and instance id by bib_record_field_id and linked_sub_fields.
For unlink operation needs to implement SQL DELETE query by instance_id, authority_id and value of 0$ subfield. In case of missmatch:
- Collect all differences of $0 for each field
- Send aggregated REST API call on unlinking (removing) instance and authorities with $0 values of existing fields
- Continue with regular updates
Peculiarities of implementation
As long as update process is implemented within mod-di-process-core library needs to consider optimal approach of embedding update of controlled and uncontrolled subfields into codebase.
Impact on dependent libraries:
- Embed unlinking in new version of mod-di-process-core (implies more operational efforts with publishing new version of the lib, update dependencies into codebase)
- extend update logic with unlinking in mod-srs codebase (could also imply changes within mod-di-process-core in case of processing new operation on top of existing di process.)
Concerns
In case of big amount of controlled bib records update when it is needed to un- (re-)link instance to authority there is supposed to be excessive amount of I/O operation (REST API calls) between mod-srs and mod-links that should be optimized by handling requests in batch (e.g.: send array with size 1000 of instance and authority ids to unlink; get linked entities in bulk)
Delivery Plan
With the foregoing concerns on potential bottleneck and a room for performance optimization there is necessary to write performance tests to measure actual capabilities of processing and get indication if further performance improvements are relevant.
- Implement the solution with handling of controlled fields logic
- Write performance tests on scenarios of updating bibs with the same $0 value and with different one.
- Measure throughput and if needed plan performance optimizations with batch processing approach
Scope and LOE
Two modules affected:
- mod-source-record-storage
- implement updating controlled fields logic (2 sprints)
- mod-entities-links
- implement unlink logic (1 sprint)
- Not yet defined scope (up to 2 sprints)
Rationale
As current performance could be enough it is better to proceed with steps 1 and 2 that implies simple implementation of the solution and performance tests coverage. Eventually solution will be delivered reasonably fast and there will be performance metrics outlined for decision making on any further optimization works (that could be not in priority as one's extra cost implied and in case of proper performance results of current approach).