MARC Authority - Phase 1 Features (UXPROD-2944)

[UXPROD-4533] Long term solution for creates/updates of authority records (entire dataset) Created: 30/Oct/23  Updated: 14/Dec/23

Status: Open
Project: UX Product
Components: None
Affects versions: None
Fix versions: Trillium (R1 2025)
Parent: MARC Authority - Phase 1 Features

Type: New Feature Priority: P2
Reporter: Khalilah Gambrell Assignee: Khalilah Gambrell
Resolution: Unresolved Votes: 0
Labels: NFR, SolutionArchitecture, arlef-di, authority, back-end, cataloging, data-import, di-swat, loc, marc-authority, metadatamanagement
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Cloners
clones UXPROD-4082 Long term solution for applying mappi... In Progress
Requires
requires MODELINKS-84 POC: Measure performance of mapping a... Closed
requires MODELINKS-92 POC: Measure performance of mapping a... Closed
requires MODINVSTOR-1057 POC: Measure performance of reading a... Closed
requires MODINVSTOR-1068 POC: Measure performance of reading a... Closed
Release: Poppy (R2 2023)
Epic Link: MARC Authority - Phase 1 Features
Front End Estimate: Out of scope
Front End Estimator: Khalilah Gambrell
Back End Estimate: XXXL: 30-45 days
Back End Estimator: Khalilah Gambrell
Back-End Confidence factor: 100%
Development Team: Spitfire
PO Rank: 0

 Description   

Problem: Tremendous difficulty to a.) migrate authority records who are moving to FOLIO from another system, OR b.) update all authority records to support FOLIO authority control.  With the Nolana release, a short term solution was applied https://folio-org.atlassian.net/wiki/display/DD/ARCH-36+Provide+a+way+to+update+MARC+authority+records+when+mapping+rules+have+changed but has proven unreliable and not scalable. In addition, folks migrating authorities from one system of another that use (POST /authority-storage/authorities) report the following issue: posting records one-by-one seems very slow. about 40 seconds per 50 records.

Until this issue is addressed, many libraries will not be able to use the MARC authority app and authority control effectively. Note Library of Congress has over 10 million authority records. 

This feature covers: 

  • Create new authority records (over 500,000+ records) 
  • Update authority records (over 500,000+ records)  

 

The proposed solution should

  • consider a solution outside of data import because the problem we are trying to solve is really with migrating a very large set of records (in many cases the entire data set) and data import is not meant to migrate millions of records.
  • be discussed with Folijet or developed in collaboration with this team because ideally it should be the solution (serve as the pattern) for bib/holdings/item records of any format. 
  • consider concurrency level change to minimize performance and/or reliability degradation
  • support the following environment setups
    • Self-hosted?
    • Single tenant + single cluster  
    • Enhanced consortia support 
    • Multi-tenant + single cluster > if one or more tenants is processing a very large dataset or entire data set then other tenants should continue to operate without significant latency or delays. [Since stand-alone solution > shared database. Performance degradation will occur but unsure of impact because it is a database update. No use of folio modules in this operation. Will need to measure via PTF testing.]
  • no impact significantly deteriorate a.) Check-in Check out (CICO), b.) data import (for example can a library still run data import jobs for orders? bibs? holdings? items? c.) access to Inventory and MARC authority workflows  
  • Stats Action: Create authority records 
        • 500,000 records are updated []
        • 1 million records are updated []
        • 5 million records are updated []
        • 10 million records are updated []
        • 20 million records can updated []
      • Action:  Update authority records
        • 500,000 records are updated []
        • 1 million records are updated []
        • 5 million records are updated []
        • 10 million records are updated []
        • 20 million records can updated []
  • Ability to have these migrations executed in the background with minimal impact to FOLIO usage. 
  • Implement
    • slicing solution similar to data import implementation UXPROD-4337 Closed
    • Report on progress and status 
    • Users should have a simple solution to get/receive files that contain records with errors (The records that were not processed successfully).
    • Log and report errors > Provide a response that includes details on records that failed to create or update in SRS and Inventory.
    • Possible: Batch POST/PUT Authority API support to support create and update over 500,000+ records
    • Support UTF-8 encoding.
    • Support optimistic locking mechanism.
    • Enforce data validation rules during the creation of MARC authority records to prevent duplicates.
    • Apply the same data validation rules enforced during the updating of MARC authority records.
    • User should be able to load a file and system just processes it until loading is complete. Highly reliable. 
  • Applies to MARC 21 format 

View migration numbers for several libraries : https://docs.google.com/spreadsheets/d/10GiFrfZee8aY8PcE0JJxf-lWtMkddFWnOYo_tiKYXrs/edit#gid=0

 

Definition of done

  • Load testing by development team 
    • Scenario: New customer: Migration from one system to FOLIO
    • Scenario: Existing customer never used MARC authority app before  
    • Scenario: Existing customer that already uses MARC authority app: Updating entire record set to support FOLIO authority control 
  • Verify handling of errors 
  • PTF testing
    • Scenario: New customer: Migration from one system to FOLIO
    • Scenario: Existing customer never used MARC authority app before 
    • Scenario: Existing customer that already uses MARC authority app: Updating entire record set to support FOLIO authority control 
  • Production-like testing
    • Scenario: New customer: Migration from one system to FOLIO
    • Scenario: Existing customer never used MARC authority app before  
    • Scenario: Existing customer that already uses MARC authority app: Updating entire record set to support FOLIO authority control 
  • Release notes and documentation 

 

 

 

 

 

 



 Comments   
Comment by Ben Taylor [ 31/Oct/23 ]

Good to know that this is being thought about.  Presumably once done solution will be applied to libraries with existing authority records but less than 500,000.

Comment by Khalilah Gambrell [ 31/Oct/23 ]

Hey Ben Taylor  - We are hoping to release a data import update that allows libraries to more reliably import less than 500,000 records in Poppy or "Q". How many authority records does Trinity have? 

Comment by Ben Taylor [ 31/Oct/23 ]

I've surprised myself by discovering that we actually have just over 500,000 authority records.  I've never considered that the number of authority records would be far more than any other sort of record!

Generated at Fri Feb 09 00:40:39 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.