2025-07-09 Data Import SIG Meeting Notes

2025-07-09 Data Import SIG Meeting Notes

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot

 

Attendees: @Jennifer Eustis @Christie Thomas @Vivian Gould @Robert Pleshar @Mary Aycock @Katie Rahman @Sheila Torres-Blank @Autumn Faulkner @Lynne Fors @Whitney Christopher @Ryan Taylor @Yael Hod @Ryan Mendenhall
Notetaker: @Yael Hod

Links:

Agenda: 

Topic

Who

Meeting Notes

Related Jira

Decisions/Actions

Announcements

ALL

no announcements

 

 

Possible regression with creation of 035

Jennifer

Curious if anyone else has noticed strange behavior with 035s on import? We have observed faulty normalization but also additional $a being added to existing 035(OCoLC) fields if MARC bibs are roundtripped. These added subfields are not always (OCoLC) numbers. Additionally, the subfield $a is not repeatable, so if one tries to edit a record afterwards, the save fails until an extra subfield $a is deleted. Here's what the guide has to say about normalization: OCLC 035 Normalization for Instances | Additional Information   We're seeing HRIDs plopped into an 035 \\$a(OCoCL) on first import which is not the behavior described in the guide. If you try to export and re-import (say, because you're sending records to your authority vendor) it comes back in with $a(OCoLC)in.... prepended to the original 035 from OCLC. The re-import will also grab an 035 without "(OCoLC)" add "(OCoLC)" and plop that into the original 035 from OCLC. Is this also expected or perhaps a bug? (We're on Ramsons SP-2). just tried https://bugfest-ramsons.int.aws.folio.org/  and what I'm seeing there is creation of an 035 for 001[HRID] either with or without (OCoLC) depending on presence of 003. Bugfest is only combining multiple 035s when each includes "$a(OCoLC)"

Sheila: 2 problems. The creation of a field with illegal multiple subfields is a big issue. Jennifer recreated in Ramsons bugfest. Simple job. In here test, it did not create an additional 035 with the instance HRID if there was one present. If your incoming file has more than one OCLC number that are different DI will create one MARC 035 with multiple subfield a which is invalid.
Christie: when you say OCLC # do you mean any 035 or just ones with OCLC prefix?
Jennifer: only with prefix for OCLC.
The job is when trying to update MARC. Not on initial import.

In the import that Jennifer showed in bugfest with two OCLC #s it created two subfield a. What Sheila saw was an 003 on the record that was creating an 035 from the HRID and it did not create extra 035s when there was no OCLC prefix.
Ryan: multiple 035s that have been normalized are being bunched together with multiple $a instead of creating multiple 035s
Christie: Is this related to the problem we have in Q that when you have duplicate OCLC #s on single record import. Maybe the fix for this is operating on the mess that is in our current records. The fix is supposed to eliminate multiple 035s when there is an 003 with the prefix that is creating a new 035. The ones that are being exported and re-imported still has those. Is DI trying to clean it up?
Jennifer: the problem is when the numeric value is not the same.
Ryan: when the numeric value is the same the deduping should work without looking at the prefix.
Mary: we have a million records with a different prefix that is being identified as an oclc number.

old ticket that Christie is referring to:
MODSOURCE-767: Single record overlay creates duplicate OCLC#/035Closed

Can other people please test in their Ramsons environment and in bugfest with older records. Jennifer will test in their environment.

Autumn: export a record with OCoLC #, import using a MARC match point.

Ryan: make sure the record has more than one 035 that start with an oclc prefix.

Sheila: export one with two oclc 035s and she what happens when you import.

Jennifer: try also with two different numeric values in the oclc #. See if its happening both in bugfest and your local environment

MODSOURMAN-1013: Do not process incoming record if it is a duplicate in the fileDraft

Jennifer

Do not process incoming record if it is a duplicate in the file

Waiting for confirmation from SMEs.
The developers would like to know what the desired behavior. How is the system determining if the record is a duplicate? Originally created in MODSOURCE-530 in poppy.
Christie: we need to determine what is or is not a duplicate. Using the match point only is not enough. It could be variable depending on the job. We may have very similar records in a batch with different instance HRIDs.
Ryan M: I like to use the MarcEdit version of deduping that gives you control over which copy to use. Different institutions may define duplicates differently. Duplicates get into the file for various reasons. Currently the system keeps the first copy but they often prefer the later one.
Ryan T: In MarcEdit you can define which field you dedupe on? Can you choose which copy more than just first or last?
Ryan M: I’ll look into the options in MarcEdit
Jennifer: How is it currently de-duping?

Ryan T: do not recall. Will double check what the current procedure is.

Christie: the ability to override is needed

 

Both Ryans will collect more information and we can re-visit at a later meeting.

 

MODDICONV-394: Disallow usage of Update MARC Bib next to Update Instance actionsOpen

Jennifer

Disallow usage of Update MARC Bib next to Update Instance actions. In the implementers topic section there is a request to be able to do this.

Ryan: this came about because it almost is as if you are doing the same thing twice. If you update the instance it will update the Marc bib and if you update Marc bib you update the instance. We wanted to keep them separate.
Christie: we need to keep both and update srs and update instance profile. Otherwise we need to do more jobs. We need to be able to do both in the same job. We need to update both the srs and the instance and while not ideal that they are updated twice functionally there are use cases where we have to do that.

Jennifer: Mistaken understanding that the fields are 1:1 relationship between the srs and the instance while that is not the case. Is there a way to extend the updating the marc srs so you can interact with the administrative fields even though they are not currently mapped.

 

Ryan will take this to the architects and the development team how to best support these use cases. If we can extend the way Marc updating works to include instance fields. Can be more of an architectural shift.

 

If anyone can, please add more use cases to the ticket. Jennifer will clean the ticket up

Support batch import for multiple holdings and items

MODINV-805: Performance Enhancement to Support batch requests for multiple holdings creationOpen

MODINV-806: Performance Enhancement to Support batch requests for multiple items creationOpen

Jennifer

Ticket was created in 2023. Ticket needs more information.

Ryan T: language is unclear. DI can support creating multiples. These tickets are about all of the holdings or all of the items be updated in one request and not in individual requests as they are done now.

 

Ryan and Jennifer will update the language so that the tickets are more clear.

Upcoming meetings/agenda topics: --

 

Chat: