2022-04-13 Data Import Subgroup meeting

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot


Attendees: Ann-Marie Breaux (Deactivated) Timothy Watters  Jennifer Eustis Lynne Fors Jenn Colt leeda.adkins@duke.edu Lloyd Chittenden Monica Arnold Raegan Wiechert Taylor Smith 

Lotus

  • Lotus Folijet planning: dashboard where you can see the current scope and status of Data Import work for Lotus - DONE!!
  • Current Data Import feature development dashboard and bugfix support
  • Current work:
    • Admin note bug just found today: hope to fix this week, either in Data Import or Inventory - TBD; also need to figure out if Holdings and Items have the same problem
    • Some TestRails deferred to Morning Glory
      • A-M: Check the multiple 856s in the holdings record especially - is it the Bugfest env or an actual bug? In Kiwi had to put the second 856 into different field, but was working properly with multiple 856s in Lotus. 
      • A-M: Check the MARC modification jobs that were getting stuck
      • Jennifer E's E2E test worked fine except for the multiple 856s, so may be OK to pass it
    • Updating Lotus release notes

Morning Glory

  • Morning Glory Folijet and Spitfire planning: dashboard where you can see the current scope and status of Data Import work for Morning Glory
  • Current work:
    • Adding Admin notes to the 3 Inventory field mapping screens
    • Log refinements: UI for deleting logs (no backend yet)
    • Starting
      • Field protection refinement (distinguish handling for repeatable and non-repeatable fields)
        • Per Jenn/Christie, also make sure that the field protections do not prevent modifications on incoming records if the field is protected, e.g. 856 fields)
      • Flow control (to help Single Record Imports and MARC Authority UI deletes not get stuck)
      • Prep for Importing order data in MARC format (will be Nolana work)

Agenda topics:

  • Raegan mentioned bug that causes old records not to be updated via import, unless there's any change in quickMARC. Per Christie, it was related to the preceding/succeeding title and how that changed the Instance schema
    • A-M check for the old records bug (different from the tags preceding/succeeding titles)
  • MARC-MARC matching
    • Lotus: Allows for any field in a MARC record except
    • Are these needed in Morning Glory?
      • Matching for 100-899 fields? Probably not many use cases for these; should work, but not heavily tested. Maybe a constructed title match, e.g. 3,3 which Innovative has as a secondary/confirmation match
      • Repeatable fields (e.g. 024, 035, 9xx)
        • Incoming record: Only first version of the field is considered (NOTE: It's the first field that has the indicator(s) and/or subfield specified in the match profile)
        • Ind 1, Ind 2, Subfield taken into account (in addition to the data)
        • Qualifier also is taken into account
        • Does FOLIO need to check all incoming 035s against all 035s in the existing SRS records? Or just the first? ALL
        • Wildcards for Ind 1, Ind 2 (repeatable or non-repeatable fields)
          • Needed? Still TBD
      • Jenn: Tested and was also able to use qualifier as a way to determine which incoming version of the repeatable field was used for matching, regardless of its position in the list of repeated fields
      • Christie: 1st 024 is not sufficient for Chicago, and is a blocker, since need MARC/MARC match to update the MARC (protect fields)
        • A-M: Add bug: can't add a cat date to the Instance and Update the SRS (protect) in the same job - is it because of the match or having Update Instance and Update SRS MARC actions in the same profile?
        • Christie: add a TestRail describing the Chicago shelfready use case
        • Jennifer E already has set up a similar one in TestRail: https://foliotest.testrail.io/index.php?/tests/view/987431
      • OCLC number formats
        • (OCoLC)12345
        • (OCoLC)ocm12345
        • (OCoLC)ocn12345
        • ocm12345
        • ocn12345
        • Can we harmonize/remove duplicates in the 035? Group agreed on the first format.
        • Also would need an on-demand script to clean up in SRS
      • Christie: 035s with other prefixes, e.g. ASP12345 (Alexander Street), but can't guarantee that this is the first 035 in the incoming record
        • Use case: two 035s in an incoming record
          • (OCoLC)12345
          • ASP12345
        • Current system: matches all incoming 035s against all existing 035s in a record
          • If OCLC 035 matches 3 records and ASP 035 matches 1 record, system stops since it can't narrow to a single record
      • Jennifer: uses various indicators for 035 (even though not valid MARC) to distinguish OCLC 035s vs vendor 035s; works well for them, as long as FOLIO does not validate the indicators. A-M confirmed that FOLIO does not validate indicators
    • Additional info from A-M/Igor:
      • Let's pretend that these fields are in an incoming record: (Field Ind1 Ind2 Subfield)

        • 024 _ _ $a 12345
          024 1 1 $a 45678
          024 1 _ $x 67890
          024 2 2 $x 67890
      • And the fields in the existing SRS record are

        • 024 2 2 $x 67890
          024 _ _ $a 12345
          024 1 _ $x 13579
          024 1 1 $a 45678
          024 1 _ $x 67890

      • I understand that for repeatable fields, FOLIO Lotus only pays attention to the first incoming field, not the rest, but compares to any matching fields in the existing record.

      • Now - setting up different match profiles, I want to be sure I understand the logic that is in place now:

      • If the match profile is 024 _ _ $a: 

        • Matches, because the incoming first 024 looks for an existing 024 with blank indicators and $a and the same value (even though that is the second 024 in the existing record)


        If the match profile is 024 1 1 $a:

        • Matches, because the first incoming 024 with indicators 11 and $a (which is the second 024 in the incoming file) looks for an existing 024 with indicators 11 and $a and the same value (which is the fourth 024 in the existing record)


        If the match profile is 024 1 _ $x:

        • Matches, because the first incoming 024 with indicators 1_ and $x (which is the third 024 in the incoming file) looks for an existing 024 with indicators 1_ and $x and the same value (which is the fifth 024 in the existing record)


        If the match profile is 024 2 2  $x:

        • Matches, because the first incoming 024 with indicators 22 and $x (which is the fourth 024 in the incoming file) looks for an existing 024 with indicators 22 and $x and the same value (which is the first 024 in the existing record)
      • However!
        Let’s pretend the incoming record looks like this:
        • 024 1 1 $a 12345
          024 1 1 $a 45678
      • And the existing SRS record is
        • 024 1 1 $a 45678
      • If the match profile is 024 1 1 $a, SRS does not match, even though “024 1 1 $a 45678” is present in both incoming and existing records.
        SRS starts searching a field, that is specified in match profile, scrolling the incoming record from the very beginning, as usual, and takes the first occurrence of <024 1 1 $a>. The first occurrence is “024 1 1 $a 12345". So, SRS takes “024 1 1 $a 12345” and can’t find it in the existing record 


    • Future topics:
      • Log refinement - finalize a few more questions
      • Results of Feb 2022 survey 
      • See if we can make the 005 modifiable, so that the existing one can be copied into a 9xx field via a MARC modification rule so that we can keep the date of last OCLC edit to a record (Lloyd)
      • Updated import stats for Lotus


Chat: 

From Raegan Wiechert to Everyone 01:09 PM
brb
back

From Christie Thomas (she/her) to Everyone 01:13 PM
not in lotus.
I can try in lotus if you think that is important. I still have not retested the two lotus tests assigned to me again, though. So I need to test those as well.

From Christie Thomas (she/her) to Everyone 01:23 PM
https://folio-org.atlassian.net/browse/MODDICORE-248

From Jenn Colt to Everyone 01:31 PM
does this only affect linked ones or not linked ones too?
there's no jira for this one?
were there errors in the log or was it silent?

From Raegan Wiechert to Everyone 01:33 PM
In the log, I was getting the error: java.lang.NumberFormatException: null

From Jenn Colt to Everyone 01:48 PM
fwiw i do think it is because of the match

From Christie Thomas (she/her) to Everyone 01:53 PM
now that I am thinking about it you are probably right.
I thought it was something else, but my thinking is evolving.
that is the version that we would want. and if it is the first one, then the match should be okay.

From Jennifer Eustis (she/her) to Everyone 01:53 PM
We want the 1st one.
at the 5C

From Lynne Fors to Everyone 01:54 PM
We would want the 1st one as well

From Jenn Colt to Everyone 01:55 PM
we clean out everything except the first one
i have to leave early for another meeting. thank you!

From Christie Thomas (she/her) to Everyone 02:04 PM
that is a good idea.

From Raegan Wiechert to Everyone 02:04 PM
In Sierra, it would just create a third record. Not horrible (didn't happen real often), but it did mean cleanup after the fact. This was in a consortial environment.

From Jennifer Eustis (she/her) to Everyone 02:04 PM
https://foliotest.testrail.io/index.php?/tests/view/987431

From Christie Thomas (she/her) to Everyone 02:05 PM
that is good to know!
this is a lot to process!

From Taylor Smith to Everyone 02:06 PM
Thanks!