MARC Authority - Phase 1 Features (UXPROD-2944)

[UXPROD-4082] Long term solution for applying mapping rules change to authority records that exist in database. Created: 20/Feb/23  Updated: 05/Feb/24

Status: In Progress
Project: UX Product
Components: None
Affects versions: None
Fix versions: Quesnelia (R1 2024)
Parent: MARC Authority - Phase 1 Features

Type: New Feature Priority: P1
Reporter: Khalilah Gambrell Assignee: Khalilah Gambrell
Resolution: Unresolved Votes: 0
Labels: LC-priority2, NFR, SolutionArchitecture, arlef-di, authority, back-end, cataloging, data-import, di-swat, loc, marc-authority, metadatamanagement
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Cloners
is cloned by UXPROD-4533 Long term solution for creates/update... Open
Defines
is defined by MODMARCMIG-4 Create module repository Closed
is defined by MODMARCMIG-11 Draft: Marc migrations dev testing Draft
Gantt End to Start
has to be done before UXPROD-3969 Improve solution that will refresh In... Open
Relates
relates to MODMARCMIG-9 Implement POST Endpoint for Data-Savi... Open
relates to MODMARCMIG-5 POST endpoint: Register new marc-migr... Closed
relates to MODMARCMIG-6 Implement Async Records Mapping Mecha... Closed
relates to MODELINKS-173 Implement Endpoint for Bulk Authoriti... In Code Review
relates to MODMARCMIG-7 Implement Async Records Mapping Mecha... In Progress
relates to MODMARCMIG-8 Create Endpoint for Retrieving MARC M... Closed
Requires
requires MODELINKS-84 POC: Measure performance of mapping a... Closed
requires MODELINKS-92 POC: Measure performance of mapping a... Closed
requires MODINVSTOR-1057 POC: Measure performance of reading a... Closed
requires MODINVSTOR-1068 POC: Measure performance of reading a... Closed
Release: Poppy (R2 2023)
Epic Link: MARC Authority - Phase 1 Features
Front End Estimate: Out of scope
Front End Estimator: Khalilah Gambrell
Back End Estimate: XXXL: 30-45 days
Back End Estimator: Khalilah Gambrell
Back-End Confidence factor: 100%
Development Team: Spitfire
PO Rank: 0

 Description   

Problem: Tremendous difficulty to  upgrade a large set of MARC authority records when mapping rules change (per release) have been applied.  With the Nolana release, a short term solution was applied https://folio-org.atlassian.net/wiki/display/DD/ARCH-36+Provide+a+way+to+update+MARC+authority+records+when+mapping+rules+have+changed but has proven unreliable and not scalable.  

Until this issue is addressed, many libraries will not be able to use the MARC authority app and authority control effectively. Note Library of Congress has over 10 million authority records. 

 

The proposed solution should

  • be discussed with Folijet or developed in collaboration with this team because ideally it should be the solution (serve as the pattern) for bib/holdings/item records of any format. 
  • consider concurrency level change to minimize performance and/or reliability degradation
  • support the following environment setups
    • Self-hosted?
    • Single tenant + single cluster  
    • Enhanced consortia support 
    • Multi-tenant + single cluster > if one or more tenants is processing a very large dataset or entire data set then other tenants should continue to operate without significant latency or delays. [Since stand-alone solution > shared database. Performance degradation will occur but unsure of impact because it is a database update. No use of folio modules in this operation. Will need to measure via PTF testing.]
  • not impact significantly deteriorate a.) Check-in Check out (CICO), b.) data import (for example can a library still run data import jobs for orders? bibs? holdings? items? c.) access to Inventory and MARC authority workflows  
  • Stats 
    •  Action: Mapping rules change
        • 500,000 records are updated []
        • 1 million records are updated []
        • 5 million records are updated []
        • 10 million records are updated []
        • 20 million records can updated []
  • Ability to have these migrations executed in the background with minimal impact to FOLIO usage. 
  • Implement
    • slicing solution similar to data import implementation UXPROD-4337 Closed
    • Report on progress and status 
    • Users should have a simple solution to get/receive files that contain records with errors (The records that were not processed successfully).
    • Log and report errors > Provide a response that includes details on records that failed to create or update in SRS and Inventory.
    • Support UTF-8 encoding.
    • Support optimistic locking mechanism.
  • Very little manual intervention. Should be painless for hosting. 
  • Applies to MARC 21 format 

View migration numbers for several libraries : https://docs.google.com/spreadsheets/d/10GiFrfZee8aY8PcE0JJxf-lWtMkddFWnOYo_tiKYXrs/edit#gid=0

 

Definition of done

  • Load testing by development team  Scenario: Existing customer that already uses MARC authority app: Mapping rules change 
  • Verify handling of errors 
  • PTF testing Scenario: Existing customer that already uses MARC authority app: Mapping rules change 
  • Production-like testing: Scenario: Existing customer that already uses MARC authority app: Mapping rules change  
  • Release notes and documentation 

 

 

 

 

 

 



 Comments   
Comment by Khalilah Gambrell [ 28/Mar/23 ]

Hey Taras Spashchenko and Pavlo Smahin 

Here is a note from University of Missouri related to this feature

Has anyone been working with authority control vendors, in terms of sending/reloading records, and have you used scripts to handle it in Data Import? The University of Missouri system has been trying to figure out how to navigate this based on DI functionality in Nolana and an efficient way to reload ~4-4.5M bib records isn't really clear to us. We're less concerned with the potential of quarterly updates but figuring out how to do the initial sync is causing us some headaches.

Comment by Lynne Fors [ 28/Mar/23 ]

This is also something that Wellesley College is interested in. We are sending our bibliographic records to our authority control vendor in late May/early June for updating and creating our new base file with FOLIO identifiers. This will be the first time we have done authority control post-migration to FOLIO in June 2022. We need to be able to import and update our bibliographic records with the updates when they are returned to us. We are expecting over 674,000 bibliographic records to be returned. We will also get the authority files that go with those records. We are concerned about how to get these records back into FOLIO in a timely and accurate manner considering the limitations of Data Import.

Comment by Lloyd Chittenden [ 28/Mar/23 ]

Regular quarterly updates from an authority vendor can also be very large. Last year Library of Congress changed the URLs in subfield $0 in authorized fields. This required all bib records in everyone's databases to be replaced.

Comment by Lloyd Chittenden [ 28/Mar/23 ]

Authority vendors also sometimes send files of authority records that need to be deleted. There needs to be a mechanism to load a file of MARC authority records, and delete them in FOLIO.

Comment by Lloyd Chittenden [ 15/May/23 ]

To be clear, this would need a lot of the features of DI. It would need field protection for example. In the case of getting new bib records from an authority vendor, I would want to replace the fields that the vendor updates without replacing other fields that might have local changes I don't want to overlay. Different vendors may update different fields and each vendor has many options for what fields they change and what they don't. It would really have to be very flexible. It's not just a total record replacement.

Comment by Ann-Marie Breaux (Inactive) [ 17/May/23 ]

Hi Lloyd Chittenden for the locally-changed fields that need to be protected, how would those be identified? Would they have a certain subfield present, e.g. $4localcode? Would it be particular MARC fields that are protected? It would be really helpful to have a sample MARC Authority file with some of those locally-changed fields. Would it be possible to attach that to this Jira? Thank you!

Comment by Taras Spashchenko [ 31/May/23 ]

Hello Khalilah Gambrell,

I am reassigning this feature back to you. The design https://folio-org.atlassian.net/browse/ARCH-46 has been completed.

Generated at Fri Feb 09 00:37:07 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.