[UXPROD-3885] Allow external data sources to be authoritative about works in the Agreements Local KB Created: 31/Oct/22  Updated: 08/Feb/24  Resolved: 09/Oct/23

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: Poppy (R2 2023)

Type: New Feature Priority: TBD
Reporter: Owen Stephens Assignee: Owen Stephens
Resolution: Done Votes: 0
Labels: erm, local_kb
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Defines
is defined by ERM-3018 Implement GOKb Title UUID as primary ... Closed
Relates
relates to UXPROD-3886 Support 'push' based mechanism for po... Closed
Release: Poppy (R2 2023)
Development Team: Bienenvolk
PO Rank: 0

 Description   

Current situation or problem:

The current synchronisation of the Agreements Local KB with an external KB makes the assumptions that:

  • There maybe multiple sources for data coming into the Local KB (e.g. GOKb + File upload)
  • That each data source is authoritative for the package and 'title in package' level information it provides but not for title and 'title on platform' information as multiple external sources can describe the same title / title on platform 
  • That the avoidance of duplication of titles is a priority for the local KB

This leads to a number of issues for users of the local KB including:

  • local KB data does not complete match the data in any single external data source
  • difficult to correct problems that have occurred at the title/title on platform level

Over time a number of different approaches have been taken to resolving the issues but none have changed the fundamental assumptions given above, and they have tended to add complexity to the system (more complex matching rules for incoming data, additional tools for managing title information in the local KB)

In a situation where the tenant is exclusively, or almost exclusively, dependant on a single source of data for it's information, the assumptions listed above are no longer valid, and a simpler approach can be taken which:

  • Uses work level identifiers from the external data source to allow the data source to be authoritative about those works
  • Lowers the priority of avoiding title duplicates

In scope

  • Support for work source IDs to allow an external data source to be authoritative for titles from the data source
  • Ensuring that new method is able to work with any existing data where title instances may have been merged / separated due to the current ingest method
  • Ability to switch from current data sync situation to one that supports authoritative work source IDs

Out of scope

  • Other changes to harvesting logic

Proposed solution/stories

Approach will be to introduce a new "work source ID" which can be used to match incoming work. The use of this new method will be optional, switched at compile time.

This will mean that all incoming works will be required to have a source ID included - this will affect import from any external data source including JSON and KBART imports



 Comments   
Comment by Owen Stephens [ 14/Apr/23 ]

Just to capture some feedback from Bernd Oberknapp in Slack (https://folio-project.slack.com/archives/C9ER2HCRY/p1681485815955459?thread_ts=1681217983.319299&cid=C9ER2HCRY):

What I have in mind is to use the GOKb UUID as an additional identifier for the matching, same for the ZDB-ID and maybe other identifiers. This would require a priority for the identifiers for the matching and a rule for handling matches from different data sources - the latter obviously is the tricky part. One option could be a priority for the data sources, probably with the highest priority for the GOKb, so that a match with an existing title without GOKb UUID would result in replacing that title with the GOKb reference title (which would be much easier than trying to merge the titles).

Generated at Fri Feb 09 00:35:34 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.