Batch Importer (Bib/Acq) (UXPROD-47)

[UXPROD-2742] MARC-MARC matching enhancements: Narrowing multiple matches & Bug work Created: 13/May/20  Updated: 07/Feb/24

Status: In Progress
Project: UX Product
Components: None
Affects versions: None
Fix versions: Quesnelia (R1 2024)
Parent: Batch Importer (Bib/Acq)

Type: New Feature Priority: P2
Reporter: Ann-Marie Breaux (Inactive) Assignee: Ryan Taylor
Resolution: Unresolved Votes: 1
Labels: LC-priority2, arlef-di, data-import, discuss-with-subgroup, loc, match-details
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File UXPROD-2742.png     Zip Archive container code_035_9matchMarcToMarc.zip    
Issue links:
Defines
defines UXPROD-47 Batch Importer (Bib/Acq) Analysis Complete
is defined by MODINV-936 Adjust Match event handlers to take i... Open
is defined by MODSOURCE-730 Remove matching event handling logic Open
is defined by MODINV-935 Add MarcMatch event handlers to inven... In Progress
is defined by MODINV-876 [RRT] Import profile with Instance ma... Closed
is defined by MODSOURCE-729 Implement new endpoint to be used for... Closed
is defined by MODSOURMAN-1046 Cannot view error during multiple ins... Closed
is defined by MODINV-882 [RRT] 5C match bug Draft
is defined by MODDICORE-362 [RRT] Failing overlay in Data Import Blocked
is defined by MODDICORE-386 Match on 035$a with a qualifier fails Blocked
is defined by UIDATIMP-1521 Not able to use system generated Matc... Blocked
is defined by MODSOURMAN-848 Data Import log displays disconcertin... In QA
Relates
relates to UXPROD-4081 Refine and standardize handling of 03... Open
relates to MODSOURCE-254 SPIKE: Review MARC Query work Open
relates to MODDICORE-251 1 SPIKE: Investigate support of multi... Closed
relates to UXPROD-4590 Data import matching by normalized ISBN Open
Release: Quesnelia (R1 2024)
Epic Link: Batch Importer (Bib/Acq)
Front End Estimate: Small < 3 days
Front End Estimator: Olamide Kolawole
Front-End Confidence factor: 60%
Back End Estimate: XXL < 30 days
Back End Estimator: Olamide Kolawole
Back-End Confidence factor: 80%
Development Team: Folijet
Kiwi Planning Points (DO NOT CHANGE): 62
PO Rank: 124
Rank: Chalmers (Impl Aut 2019): R3
Rank: Chicago (MVP Sum 2020): R2
Rank: Cornell (Full Sum 2021): R1
Rank: Duke (Full Sum 2021): R1
Rank: 5Colleges (Full Jul 2021): R1
Rank: FLO (MVP Sum 2020): R2
Rank: GBV (MVP Sum 2020): R4
Rank: Grand Valley (Full Sum 2021): R2
Rank: Lehigh (MVP Summer 2020): R1
Rank: MO State (MVP June 2020): R1
Rank: TAMU (MVP Jan 2021): R2
Rank: U of AL (MVP Oct 2020): R1
Score: 15
Showstopper for Summer 2021 Implementers?: No
Showstopper December 11 Meeting Summary: Not a 'showstopper', so not discussed at meeting.

 Description   

Current situation or problem:
MARC-MARC matches and MARC-Inventory matches have differing use cases. Pairing a MARC-MARC match with a more specific MARC-Instance or MARC–Holdings or MARC–Item match allows for identifying a specific record to be updated, or confirms that a new record is needed.

We want to ensure that MARC-MARC matching works properly for repeatable and non-repeatable fields, especially 0XX/9XX fields, and that they can pair well with Inventory submatches.

In scope:

  • MARC-MARC matches that result in multiple possible hits can be narrowed to single records with MARC-Inventory or static value submatches
  • Review any existing matching bugs and plan to resolve as part of this feature

Out of scope:

  • After a MARC-MARC or MARC-Instance match, a user can include both Instance and MARC Bib actions afterwards (need examples from users)
  • Confirm that MARC matches are working properly using indicator wildcards (asterisks) versus blanks
  • Currently ISBN matches do not translate 10-digit and 13-digit so that they can be matched against each other, Include in this feature, or handle as a separate feature in the future? See MODSOURMAN-269 Closed
  • Should we add a bug for not being able to have an override action for field protections under a MARC-Instance match?
  • What else?

Use case(s):

  • SMEs: Please add examples*
     * MARC-MARC match on OCLC number and then submatch by Instance status
  • MARC-MARC match on 001 and then submatch for holdings by permanent location
  • Need a use case that results in multiple SRS hits that then need to be narrowed down by Inventory match

Proposed solution/stories:

 

Links to additional info:

 

Questions:

  • Confirm most important MARC-MARC matching fields, e.g. identifier fields (010, 019, 020, 022, 024, 028, 035, 074) and 9xx fields


 Comments   
Comment by Ann-Marie Breaux (Inactive) [ 20/Nov/20 ]

Convo with Mark Veksler, Hkaplanian, Magda Zacharska, VBar, Taras Spashchenko about Cornell SRS queries ( UXPROD-2791 Closed )

Comment by Ann-Marie Breaux (Inactive) [ 24/Nov/20 ]

Hi Taras Spashchenko Please create the rest of the stories for the endpoints (and maybe paging, to handle large data sets) by the end of this week, so that Folijet can include in sprint 103. Thank you!

Comment by Ann-Marie Breaux (Inactive) [ 02/Dec/20 ]

Hi Taras Spashchenko Just checking in on this. Concorde has follow-on work that happens after this work. Do you think you will be able to finalize the Folijet stories this week?

Comment by Taras Spashchenko [ 03/Dec/20 ]

Hello Ann-Marie Breaux, the stories will be ready today.

Comment by Ann-Marie Breaux (Inactive) [ 03/Dec/20 ]

Sounds good - thanks, Taras Spashchenko

Comment by Taras Spashchenko [ 03/Dec/20 ]

I added tech stories for MARC search functionality

https://folio-org.atlassian.net/browse/MODSOURCE-221
https://folio-org.atlassian.net/browse/MODSOURCE-222
https://folio-org.atlassian.net/browse/MODSOURCE-223

Comment by Ann-Marie Breaux (Inactive) [ 08/Dec/20 ]

Discussed with Magda Zacharska, and we moved the prep stories from this feature to UXPROD-2791 Closed , since Concorde will be working on all of them. The prep stories are all linked under the MODSOURCE-215 Closed umbrella.

cc: Jenn Colt Taras Spashchenko Oleksii Kuzminov

Comment by Lisa McColl [ 04/Feb/21 ]

The workaround because this is not available is very time consuming. We've been doing this for five months at this point. I think without this feature a larger institution than ours could not sustain the workflow we've had to do outside of FOLIO in order to update our records. I see this is Blocked awaiting a dependency. Is there any estimate on which version we can expect this in? I would love to see this go up to a P1, to be honest. It would be interesting to get community feedback on it.

Comment by Jenn Colt [ 04/Feb/21 ]

Hi Lisa- The UXPROD for the dependency that is blocking this is https://folio-org.atlassian.net/browse/UXPROD-2791 We are trying to get 2791 done for Iris (although it is at risk at this point) but after that it would still take time for Ann Marie's developers to take advantage of it. But it would be good to uprank 2791 if you think it is important to you, right now it as viewed as just a Cornell issue 🙂

That said, I am currently doing our 035$a matches with MARC to instance and then using a qualifier, and that seems like it will work for many of mine (doesn't help with the $z I realize.)

Comment by Lisa McColl [ 04/Feb/21 ]

Thank you Jenn! I know you mentioned this in Slack to me too, so thank you for your time in both places. I added myself as a watcher to UXPROD-2791 Closed .

An 035$a to OCLC identifier match would leave us with a lot of duplicate records in FOLIO. Just being able to query the SRS with the LDP would save a little time, and make the results better when we bring in new records. I'm pretty eager for all the above and attached SRS functionality to get into place. Right now when I get a new file from WorldShare, for example, I query by URL, OCLC number, and ISBN, to get any possible match out of the FOLIO. I perform the matches between what I find in FOLIO and the new file in OpenRefine. That leads to two files to two files to load "new to folio" and "merge with existing folio". For the "merge" file I just match on the instance hrid at that point. It's very time consuming and hard to spread the workload around since it's so weirdly specialized.

Comment by Ann-Marie Breaux (Inactive) [ 18/May/21 ]

Discussed with Jenn Colt and reviewed the stories on MODSOURCE-215 Closed . The unfinished EDGSSRS stories are for outsiders trying to search into FOLIO. The key issues for the MARC-MARC matching are MODSOURCE-222 Closed , MODSOURCE-223 Closed , and MODSOURCE-228 Closed . Plus there's one more story MODSOURCE-224 Draft that would need to be completed.

Comment by Jenn Colt [ 23/Mar/23 ]

ISBN case - I have a set of records with multiple ISBNs. The matches are not working if there is more than one ISBN on the incoming record that qualifies for matching. I can add a 978 qualifier which helps but these records have multiple 978 ISBNs in some cases and those matches do not work. My expectation was that if any incoming ISBN matches and existing ISBN, the match would be positive.

Comment by Jenn Colt [ 12/Apr/23 ]

Ann-Marie Breaux I notice this is not scheduled. Is there any way to consider it for Poppy? We are doing so much more MARC to MARC now because of the field protection change that this is becoming more of a problem.

Comment by Jenn Colt [ 24/Apr/23 ]

In Slack it was asked why this was a problem. I answered:

 
The problem mostly came from the field protections applying to instances. Before that I could do all the matching I needed to at the instance level. Now in order to override field protections I believe I can only use MARC to MARC matching. I'm encountering two main problems:

  • the repeatable fields problem. for instance matching on an ISBN at the instance level was never a problem but matching on repeating fields like this in the MARC seems like it is
  • combining matches. Instance matching I can match on both an OCLC number and a custom identifier in the instance. In MARC I can't nest a second MARC match for the collection identifier. I also can't confirm the location of the instance because I can no longer match on the holding because I can't combine a marc match and a holdings match and then update marc

If I am incorrect about needing to match on MARC to update MARC that would be amazing to know, but as far as I know this is the situation that we have created. We have made it so that we have to update the entity that does not have adequate matching capability when we need to override field protections. And we have to do that all the time when loading vendor electronic resource records.

Comment by Jennifer Eustis [ 05/Jan/24 ]

Hi Ryan Taylor ,

Here is one of our more frequent use cases for matching. This example can be done from an incoming marc field to an existing instance system control number field or to an existing srs marc field 035 with 1st indicator 9 and second indicator blank. L

Use Case description: 5C has a shared bib environment. We also buy eResources that can be found in eresources packages that we buy. We have combinations where an eresource can be acquired by 1 or more of the schools in the 5C consortia. For eresources that we need to track or match on such as to identify all those in the Safari package have what we call a container code in the marc field 035 \9 which is an alphanumeric and unique code that we create. We create these by using the school's prefix (AC, HC, MH, SC, UM) or if it is for 1+ of the schools we use FC or 5C and then either the OCLC number or the document identifier or a vendor number or any unique value from the record. For example, in the marc file in the zip folder, you'll see ACYBP or Amherst College Yankee Book Peddler. FCDUGT or consortia loaded package for De Gruyter for 4 schools) and UMDEGT for the De Gruyter package for UMass. For the YBP stuff, this used to be 035 9\$aumypb or 035 9\$aacybpOCLCNumber. From the file.

=035  9\$a(ACYBP)1373347480
=035  9\$a(FCDUGT)9781501769214
=035  9\$a(UMDEGT)9781501769214

We need to make an exact match on the specific container code. In this use case, the incoming file is for UMass. We need to check with a match from incoming marc 035 9\ to EITHER instance system control number OR existing srs 035 9\ for the exact match.

If there is a match, no records are created.

If there is no match, an instance (plus srs), holdings, and items are created.

 

Requirements: The container code must be unique and in the marc field 035 9\ which is mapped to the instance system control number.

Location of container code in marc file: This can be the 1st or 2nd of the 035 9\'s. There can be many different container codes. My existing marc srs file has this where I included several examples of the match having a different ordinality.

Job Profile with Marc to Marc match:

Exact Match incoming 035 9\$a to existing srs 035 9\$a.

No Match: take no further action

Match: create instance (and srs), create holdings, create item, modify incoming marc to remove 856, 876, 852, 877 (these fields are used to create holdings and items: 856$u holdings electronic access uri, 856$y link text, 856$z public note, 876$a barcode, 852$lpermanent loan type (item), 877 item material type, 852$h call number holdings, 852$t holdings call number type, 852$l holdings permanent location code. Holdings electronic access relationahip is set to resource in the mapping profile and item status is set to available in the mapping profile.)

 

Please note that 5C creates item records with fake barcodes for eResources. Be wary when reloading these records - will need to change barcode or remove barcode from testing.

 

In the folder:

incoming file is for UM De Gruyter eResources with the container code in the 035 9. This should be the 1st of the 035's but not necessarily. This file also has the ISBN in another 035 9.

Existing file is from our 5C prod and has our instance HRIDs and uuids in the file. To load them remember to remove the 001 and 999.

 

Expected behavior:

The marc to marc AND marc to instance system control number match work with repeating fields where the field being matched might be 1st or not in ordinality.

 

container code_035_9matchMarcToMarc.zip

Comment by Jennifer Eustis [ 19/Jan/24 ]

Ryan Taylor I realized that I got a little over rambunctious with repeatable Marc fields. The incoming file should have only 1 match point or 1 035 9\$a field. I remember speaking to Ann-Marie about this. If I remember correctly, there is logic that only the 1st of the repeating fields is considered. I'm not sure if this is the case or if you can verify this. Otherwise, I'm fine with specifying that there be only 1 match point in the incoming file. Though this still means we have to consider that this one match point will match to one of many marc srs fields in the existing srs.

Generated at Fri Feb 09 00:26:28 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.