2023-09-20 Data Import Subgroup meeting

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot


Attendees: Jennifer Eustis Ann-Marie Breaux (Deactivated) Ryan Taylor Robert Pleshar Christie Thomas Corrie Hutchinson Jenn Colt Lloyd Chittenden Lynne Fors Raegan Wiechert Taylor Smith 


Links:

Agenda: 

  • Announcements:
    • Introduction of Ryan Taylor
    • Jennifer and A-M had lots of good UAT for the multiples work in lab a few weeks ago; will use that for some test use cases and documentation; in Poppy we're confident that multiple creation works well, but updates have not been tested as thoroughly; any additional use cases or feedback on updates will be helpful. 
  • Agenda topics:
    • Review results of Quesnelia Feature & Bug Prioritization Exercise (see slide deck detailing results HERE)
    • Take a closer look and discuss UXPROD-2742: MARC-MARC matching enhancements
      • OCLC number primary match and instance status submatch - there are 3 scenarios
        • No OCLC number match, so stop
        • Single record match, so don't need to narrow the initial match, but need to do a submatch to decide whether to act on the record or not (e.g. only act if the record matches the OCLC number and has status of uncataloged)
        • Multiple record match, and then narrow to single record with an instance match
        • Christie can provide sample use case
      • Match across MARC and Inventory is no always clear - make sure it's documented well
      • Other thoughts
        • Confirm that the MARC matches are working properly (see example from Jennifer where matches to system number didn't seem to match in 41 cases in a recent file)
        • Confirm what can be used as primary vs submatches (currently, default matches can only be used as submatches)
        • Multiple record matches, and then narrowing with submatch to determine correct record to update
    • Discuss 035 OCLC numbers issue
      • Would be great for development if we can come up with a default format, instead of having to support tenant-level preferences
      • Maybe query the larger community
      • Varying prefixes (OCoLC), (OCoLC)ocm, (OCoLC)ocn cause problems
      • Leading zeroes cause problems
      • Jenn: 001/003 manipulation - needs additional review
        • Make sure that the 001 HRID is not being put into an 035 when a record is re-imported
        • Could there be a setting or a MARC modification that allowed for removing the 001/003 from incoming records or allows the tenant to set up 001/003 handling as they wanted to
    • UXPROD-4080
      • In title of the feature, change specific fields to "individual" or "targeted"
      • One possible use case: incoming record with LDR, 245$a, 856 - want to replace only the 856 in the existing fully-cataloged SRS MARC record
      • Have more detailed conversations about requirements in future meetings
      • Likely that most MARC updates will be on repeatable fields, not non-repeatable fields
  • Looking at the high priority bugs
    • MARC modifications - get more info from Jennifer re: putting the action at beginning or end of the job
    • Have additional discussions about MARC modifications and overriding MARC modifications
    • Piecemeal fixes may break previously-fixed things
    • If we automate the tests for some of these bugfixes, then we should be able to catch accidental regressions immediately
    • Cleanup after a job is cancelled is painful and time-consuming, per Jennifer and Lynne. Kate S created a Jira to clean up MODSOURCE-631
    • Per Lynne, sometimes a record is Source = MARC, but is disconnected from the underlying MARC; only way to clean them up is to edit via db change to make it source = FOLIO, then overlay. There is a Jira to address this, will be done in Poppy or Quesnelia MODINV-847

Upcoming meetings/agenda topics:

  • Quesnelia feature requirement details
  • Demo/discussion of the new large file "slicing" feature in Poppy


Chat

Ann-Marie  to  Everyone 1:09 PM
And are there topics that are not in the "in scope" section that need to be WRT matching?

Christie Thomas (she/her)  to  Everyone 1:16 PM
Exactly!

Jennifer Eustis (UMass/5Colleges)  to  Everyone 1:29 PM
Also exported to 3rd party systems for DD/ILL or WorldCat

Ann-Marie  to  Everyone 1:29 PM
We like polls!  definitely good idea, and something to run by the larger community

Christie Thomas (she/her)  to  Everyone 1:29 PM
+1 lloyd

Jennifer Eustis (UMass/5Colleges)  to  Everyone 1:31 PM
For single record imports, sometimes the 035 is generated multiple times

Christie Thomas (she/her)  to  Everyone 1:31 PM
It is! We see that a lot.
And even when it is in the same form.

Lynne Fors  to  Everyone 1:31 PM
And overlays can add HRIDs as 035s when they shouldn't.

Jenn Colt  to  Everyone 1:32 PM
The OCLC docs read to me that once the OCLC number is out of the 001 it should not have prefixes https://help.oclc.org/Metadata_Services/Fundamentals/WorldShare_Collection_Manager_Fundamentals/Data_sync_collections_(Fundamentals)/Prepare_your_data_(Fundamentals)/30035_field_and_OCLC_control_numbers

Ann-Marie to  Everyone 1:34 PM
Also a good point, Lloyd - data migration for the existing data needs to be taken into account as well, to be sure previously loaded and newly-loaded data are being treated the same way

Jennifer Eustis (UMass/5Colleges)  to  Everyone 1:35 PM
This happened to us in the 5C. We cleaned up our OCLC numbers so that we removed the prefix to have only (OCoLC)number. Now with the 001/003 generation, we now have variations in the formats

Ann-Marie to  Everyone 1:35 PM
Really good point about ISBNs, Christie - I'm continually frustrated by how hard ISBNs are to use as matchpoints, due to having a non-number (X) in ISBN-10s, not currently normalizing between 10s and 13s, and having extraneous data like binding sometimes in the MARC $a and always in the Instance ISBN

Corrie Hutchinson  to  Everyone 1:36 PM
I’m not advocating one option over another.  Just curious about all the ways to address this issue.

Christie Thomas (she/her)  to  Everyone 1:37 PM
Absolutely, Corrie! I think it is definitely worth asking the question and understanding the impact.

Jennifer Eustis (UMass/5Colleges)  to  Everyone 1:37 PM
We sometimes end up with 3 OCLC numbers strangely enough

Christie Thomas (she/her)  to  Everyone 1:37 PM
That happens to all of us, too!

Jennifer Eustis (UMass/5Colleges)  to  Everyone 1:37 PM
Right now we can't do marc modifications on LDR, 001, 003 or fixed fields

Corrie Hutchinson  to  Everyone 1:38 PM
+1 Jenn

Jenn Colt  to  Everyone 1:38 PM
Only for very old ones
Less than 8 digits

Ann-Marie to  Everyone 1:39 PM
Those leading zeroes are such a dang pain - and I didn't even realize they were there until I was playing with ISRI a bunch one day

Christie Thomas (she/her)  to  Everyone 1:40 PM
They are a pain! We have som many staff that rely on reporting on Excel and having those leading zeroes makes Excel reporting a problem.

Jenn Colt  to  Everyone 1:41 PM
Any of those!

Jenn Colt  to  Everyone 1:52 PM
Yeah I wondered if these are kind of the same issue in some way

Ann-Marie to  Everyone 1:53 PM
One thing that may help is automating some of the tests related to these bugfixes - if it's working, and then a change breaks it, the automated test will catch it immediately.

Jennifer Eustis (UMass/5Colleges)  to  Everyone 1:59 PM
For the 970, it would be great to discuss how we can handle the cleanup left when a job is cancelled. Right now, we've found that this is very manual and sometimes needs reports to find lost records

Jennifer Eustis (UMass/5Colleges)  to  Everyone 2:01 PM
That happens to us all the time

Lynne Fors  to  Everyone 2:03 PM
If it came up in the last week or so, it was probably me