Batch Importer (Bib/Acq) (UXPROD-47)

[UXPROD-4081] Refine and standardize handling of 035 OCLC numbers in MARC records (including potential duplicates) Created: 03/Mar/21  Updated: 05/Feb/24

Status: Open
Project: UX Product
Components: None
Affects versions: None
Fix versions: Ramsons (R2 2024)
Parent: Batch Importer (Bib/Acq)

Type: New Feature Priority: P2
Reporter: Ann-Marie Breaux (Inactive) Assignee: Ryan Taylor
Resolution: Unresolved Votes: 0
Labels: LC-priority2, data-import, di-swat, epam-folijet, loc, mapping-profiles, possible-ramsons, quesnelia-stretch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File UXPROD-4081.png    
Issue links:
Defines
defines UXPROD-47 Batch Importer (Bib/Acq) Analysis Complete
is defined by MODINV-949 Update Data Import logic to normalize... Open
is defined by MODDICORE-339 Single record overlay creates duplica... Blocked
Relates
relates to MODINVSTOR-1155 Create script to standardize OCLC ide... Open
relates to MODSOURCE-734 Create script to standardize 035 valu... Open
relates to UXPROD-2742 MARC-MARC matching enhancements: Narr... In Progress
relates to MODINV-849 Move 001+003->035 logic to inventory ... Closed
Release: Ramsons (R2 2024)
Epic Link: Batch Importer (Bib/Acq)
Analysis Estimate: Small < 3 days
Analysis Estimator: Ann-Marie Breaux (Inactive)
Front End Estimate: Out of scope
Front End Estimator: Mariia Aloshyna
Front-End Confidence factor: 90%
Back End Estimate: Large < 10 days
Back End Estimator: Kateryna Senchenko
Back-End Confidence factor: 90%
Development Team: Folijet
PO Rank: 105
Rank: Cornell (Full Sum 2021): R5
Solution Architect: Olamide Kolawole

 Description   

Current situation or problem: When OCLC 035s are created from incoming 001/003 data via Inventory Single Record Import or regular Data Import, the record often ends up with multiple OCLC 035s that have the same number, but varying prefixes. In addition, sometimes OCLC 035s contain leading zeroes and sometimes they don't. All of this makes it difficult to match on OCLC numbers and to have consistent presentation of OCLC numbers in Instances

In scope

  • Determine desired format for OCLC numbers in 035 $a and 035 $z
    • Include (OCoLC)?
    • Include ocm/ocn or not?
    • Include leading zeroes or not
  • Create functionality that normalizes existing OCLC numbers in incoming records, as well as OCLC numbers that are being created via 001/003 manipulation

Out of scope

  • Only do this for MARC Bibs. Do not do this for MARC Authorities as part of this feature.
  • Create cleanup script or process that allows for cleanup of existing inconsistent OCLC numbers in SRS MARC Bib records

Use case(s):

Proposed solution/stories:

Story 1:

  • Data Import logic should be updated to identify OCLC numbers in 035 field via inclusion of (OCoLC). Values in these fields should then be normalized to do the following:
    • Retain (OCoLC)
    • If prefix of 'ocn' or 'ocm' appear, they should be removed
    • If leading zeros appear, they should be removed
    • EXAMPLE: Incoming value of 035 (OCoLC)ocm123456 should result in 035 (OCoLC)123456
  • If duplicates exist after normalization, they should be de-duplicated so that only one of the normalized 035 values remain.
  • If there's multiple 035s with the same $a value after normalization and one has additional subfields (e.g. $z), retain only the one with multiple subfields.
  • If there's any prefix values that are NOT 'ocm' or 'ocn', the original prefix should remain.
    • Example: 035   ‡a (OCoLC)tfe501056183 035   ‡a (OCoLC)501056183

Story 2:

  • Do we need separate story to adjust the 001/003/035 automatic handling to account for the (OCoLC) normalization changes?

Story 3 (Move to a follow-up feature):

  • Create script to identify and normalize existing (OCoLC) 035s already in place in FOLIO SRS MARC Bibs
    • Should also account for duplicates
    • Should be applied to MARC Bibs and associated Instances

Links to additional info:

Examples:

  • 001 ocm123456 and 003 OCoLC create 035 (OCoLC)ocm123456, but there may already be an 035 in the record like this: 035 (OCoLC)123456
  • NOTE: If there's any prefix values that are NOT 'ocm' or 'ocn', the original prefix should remain.
    Example: 035   ‡a (OCoLC)tfe501056183 035   ‡a (OCoLC)501056183


 Comments   
Comment by Ann-Marie Breaux (Inactive) [ 05/Jan/22 ]

Moving from Lotus to Morning Glory

Comment by Ann-Marie Breaux (Inactive) [ 04/May/22 ]

Reducing scope for Morning Glory; this issue moved from MG feature to Nolana feature

Comment by Ann-Marie Breaux (Inactive) [ 12/Jul/22 ]

Kateryna Senchenko Before I spend a lot of time creating the requirements, could you give me a quick analysis?

If there are 3 035 fields in a MARC record (1 maybe created from combining 001 and 003)

  • 035 $a ocm123456
  • 035 $a (OCoLC)123456
  • 035 $a (OCoLC)ocm123456
  1. How hard would it be for DI to recognize these are duplicate numbers, delete 2 of them, and retain (OCoLC)123456?
  2. If we could do it, could we include it as a standard check when an SRS MARC Bib was being created or updated?
  3. If that is too complicated, would it maybe be possible to create a script to identify these in MARC Bibs and clean them up? Once a library cleans up their existing records, they could maybe plan to run the script against newly updated records since date x, and not have as many to clean up.
Comment by Kateryna Senchenko [ 14/Jul/22 ]

Hi Ann-Marie Breaux

  1. We can write such function, shouldn't be hard. Are those the only options (can in it be something like (ocm) or ocm(OCoLC), or any other words)?
  2. Yes, it is possible
  3. if we add this verification for new imports, should we create a script to clean up existing records in SRS? Should existing Instances also be updated? We can create such scripts - separate for SRS and mod-inventory, but they would take a long time to execute on large data sets (we'll need to test those scripts to provide guidelines)

Thanks

Comment by Ryan Taylor [ 31/Oct/23 ]

Kateryna Senchenko - In the Description of this feature, can you please review "Story 1" and, based on your understanding, do you think "Story 2" would be necessary to account for any changes in 001/003/035 handling? Thanks!

Comment by Kateryna Senchenko [ 22/Jan/24 ]

Hi Ryan Taylor,

I reviewed MODINV-949 Open (moved it to MODINV since it is BE work), all looks good, no additional ticket for Story 2 is needed. However, we need separate tickets for making changes in existing records - Instances (mod-inventory-storage) and MARC Bib (mod-source-record-storage). Those migration scripts would account for most effort on this feature, therefore I think we need to review the estimate. Thank you

Comment by Ryan Taylor [ 22/Jan/24 ]

Moved Clean-up script plans out of scope and will tackle in a follow-up feature (to-be-created)

Comment by Kateryna Senchenko [ 23/Jan/24 ]

Hi Ryan Taylor

I created stories for clean up scripts MODINVSTOR-1155 Open and MODSOURCE-734 Open , linked them as related to this feature, but set as Not Scheduled. Please link them to the follow-up feature once you create one. Thank you!

Otherwise, looks like this feature can be opened, provided estimate is adequate.

Generated at Fri Feb 09 00:37:07 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.