2023-09-20 Data Import Subgroup meeting
Recordings are posted Here (2022+) and Here (pre-2022) Slack channel for Q&A, discussion between meetings
Requirements details Here Additional discussion topics in Subgroup parking lot
Attendees: Jennifer Eustis Ann-Marie Breaux (Deactivated) Ryan Taylor Robert Pleshar Christie Thomas Corrie Hutchinson (Unlicensed) Jenn Colt Lloyd Chittenden Lynne Fors Raegan Wiechert Taylor Smith
Links:
- Poppy import planning dashboard
- Poppy timeline
- Folijet Current Development Board
- Folijet (Data import) Bug Dashboard
Agenda:
- Announcements:
- Introduction of Ryan Taylor
- Jennifer and A-M had lots of good UAT for the multiples work in lab a few weeks ago; will use that for some test use cases and documentation; in Poppy we're confident that multiple creation works well, but updates have not been tested as thoroughly; any additional use cases or feedback on updates will be helpful.
- Agenda topics:
- Review results of Quesnelia Feature & Bug Prioritization Exercise (see slide deck detailing results HERE)
- Take a closer look and discuss UXPROD-2742: MARC-MARC matching enhancements
- OCLC number primary match and instance status submatch - there are 3 scenarios
- No OCLC number match, so stop
- Single record match, so don't need to narrow the initial match, but need to do a submatch to decide whether to act on the record or not (e.g. only act if the record matches the OCLC number and has status of uncataloged)
- Multiple record match, and then narrow to single record with an instance match
- Christie can provide sample use case
- Match across MARC and Inventory is no always clear - make sure it's documented well
- Other thoughts
- Confirm that the MARC matches are working properly (see example from Jennifer where matches to system number didn't seem to match in 41 cases in a recent file)
- Confirm what can be used as primary vs submatches (currently, default matches can only be used as submatches)
- Multiple record matches, and then narrowing with submatch to determine correct record to update
- OCLC number primary match and instance status submatch - there are 3 scenarios
- Discuss 035 OCLC numbers issue
- Would be great for development if we can come up with a default format, instead of having to support tenant-level preferences
- Maybe query the larger community
- Varying prefixes (OCoLC), (OCoLC)ocm, (OCoLC)ocn cause problems
- Leading zeroes cause problems
- Jenn: 001/003 manipulation - needs additional review
- Make sure that the 001 HRID is not being put into an 035 when a record is re-imported
- Could there be a setting or a MARC modification that allowed for removing the 001/003 from incoming records or allows the tenant to set up 001/003 handling as they wanted to
- UXPROD-4080
- In title of the feature, change specific fields to "individual" or "targeted"
- One possible use case: incoming record with LDR, 245$a, 856 - want to replace only the 856 in the existing fully-cataloged SRS MARC record
- Have more detailed conversations about requirements in future meetings
- Likely that most MARC updates will be on repeatable fields, not non-repeatable fields
- Looking at the high priority bugs
- MARC modifications - get more info from Jennifer re: putting the action at beginning or end of the job
- Have additional discussions about MARC modifications and overriding MARC modifications
- Piecemeal fixes may break previously-fixed things
- If we automate the tests for some of these bugfixes, then we should be able to catch accidental regressions immediately
- Cleanup after a job is cancelled is painful and time-consuming, per Jennifer and Lynne. Kate S created a Jira to clean up MODSOURCE-631
- Per Lynne, sometimes a record is Source = MARC, but is disconnected from the underlying MARC; only way to clean them up is to edit via db change to make it source = FOLIO, then overlay. There is a Jira to address this, will be done in Poppy or Quesnelia MODINV-847
Upcoming meetings/agenda topics:
- Quesnelia feature requirement details
- Demo/discussion of the new large file "slicing" feature in Poppy
Chat
Ann-Marie to Everyone 1:09 PM
And are there topics that are not in the "in scope" section that need to be WRT matching?
Christie Thomas (she/her) to Everyone 1:16 PM
Exactly!
Jennifer Eustis (UMass/5Colleges) to Everyone 1:29 PM
Also exported to 3rd party systems for DD/ILL or WorldCat
Ann-Marie to Everyone 1:29 PM
We like polls! definitely good idea, and something to run by the larger community
Christie Thomas (she/her) to Everyone 1:29 PM
+1 lloyd
Jennifer Eustis (UMass/5Colleges) to Everyone 1:31 PM
For single record imports, sometimes the 035 is generated multiple times
Christie Thomas (she/her) to Everyone 1:31 PM
It is! We see that a lot.
And even when it is in the same form.
Lynne Fors to Everyone 1:31 PM
And overlays can add HRIDs as 035s when they shouldn't.
Jenn Colt to Everyone 1:32 PM
The OCLC docs read to me that once the OCLC number is out of the 001 it should not have prefixes https://help.oclc.org/Metadata_Services/Fundamentals/WorldShare_Collection_Manager_Fundamentals/Data_sync_collections_(Fundamentals)/Prepare_your_data_(Fundamentals)/30035_field_and_OCLC_control_numbers
Ann-Marie to Everyone 1:34 PM
Also a good point, Lloyd - data migration for the existing data needs to be taken into account as well, to be sure previously loaded and newly-loaded data are being treated the same way
Jennifer Eustis (UMass/5Colleges) to Everyone 1:35 PM
This happened to us in the 5C. We cleaned up our OCLC numbers so that we removed the prefix to have only (OCoLC)number. Now with the 001/003 generation, we now have variations in the formats
Ann-Marie to Everyone 1:35 PM
Really good point about ISBNs, Christie - I'm continually frustrated by how hard ISBNs are to use as matchpoints, due to having a non-number (X) in ISBN-10s, not currently normalizing between 10s and 13s, and having extraneous data like binding sometimes in the MARC $a and always in the Instance ISBN
Corrie Hutchinson to Everyone 1:36 PM
I’m not advocating one option over another. Just curious about all the ways to address this issue.
Christie Thomas (she/her) to Everyone 1:37 PM
Absolutely, Corrie! I think it is definitely worth asking the question and understanding the impact.
Jennifer Eustis (UMass/5Colleges) to Everyone 1:37 PM
We sometimes end up with 3 OCLC numbers strangely enough
Christie Thomas (she/her) to Everyone 1:37 PM
That happens to all of us, too!
Jennifer Eustis (UMass/5Colleges) to Everyone 1:37 PM
Right now we can't do marc modifications on LDR, 001, 003 or fixed fields
Corrie Hutchinson to Everyone 1:38 PM
+1 Jenn
Jenn Colt to Everyone 1:38 PM
Only for very old ones
Less than 8 digits
Ann-Marie to Everyone 1:39 PM
Those leading zeroes are such a dang pain - and I didn't even realize they were there until I was playing with ISRI a bunch one day
Christie Thomas (she/her) to Everyone 1:40 PM
They are a pain! We have som many staff that rely on reporting on Excel and having those leading zeroes makes Excel reporting a problem.
Jenn Colt to Everyone 1:41 PM
Any of those!
Jenn Colt to Everyone 1:52 PM
Yeah I wondered if these are kind of the same issue in some way
Ann-Marie to Everyone 1:53 PM
One thing that may help is automating some of the tests related to these bugfixes - if it's working, and then a change breaks it, the automated test will catch it immediately.
Jennifer Eustis (UMass/5Colleges) to Everyone 1:59 PM
For the 970, it would be great to discuss how we can handle the cleanup left when a job is cancelled. Right now, we've found that this is very manual and sometimes needs reports to find lost records
Jennifer Eustis (UMass/5Colleges) to Everyone 2:01 PM
That happens to us all the time
Lynne Fors to Everyone 2:03 PM
If it came up in the last week or so, it was probably me