2023-05-24 Data Import Subgroup meeting

Recordings are posted Here (2022+) and Here (pre-2022)                   Slack channel for Q&A, discussion between meetings

Requirements details Here                                                                    Additional discussion topics in Subgroup parking lot


Attendees: Ann-Marie Breaux (Deactivated)  Jennifer Eustis Lisa Smith Taylor Smith Monica Arnold Kim Wiljanen Christie Thomas Colin Van Alstine Heather MacFarlane (Deactivated) Jenn Colt Jeanette Kalchik leeda.adkins@duke.edu 

Agenda: 

  • Announcements
    • PC meeting about Data Import on 1 June
    • WOLFcon: any ideas for an import session? Need to submit a proposal by June 5
      • Make sure there is a virtual option
      • Coordinate with any MM submissions
      • Homework: Think about it this week, and decide next week
    • ECS mockups for Inventory UI updates, not sure if any DI UI updates; would it be useful to review in DI Subgroup? Or maybe MM SIG?
      • A-M and Christine propose to MM SIG
    • Performance
      • EBSCO Dev Task Force - kick off yesterday - working on 1) chunking of large files so that they do not overwhelm DI/SRS/Inventory and 2) ensuring that small import jobs do not get stuck behind large import jobs
      • Single Record Import
        • Per Jennifer, sometimes they go really slowly, and sometimes they just tank - since Kiwi; lots of "no record" imports; maybe an optimistic-locking-related problem since Nolana?
        • Per Jenn, in Orchid dry run, the Kafka priority topics for ISRI don't seem to be working; they get stuck behind big jobs (e.g. 7 mins for 1 record)
        • Per Christie, all sorts of weird error messages for ISRI; will add to the error wiki page
    • Error messages
      • Need Jira tickets created - A-M will create tickets when there's time
  • Multiple Holdings and Items
    • Review Jennifer/5C EEBO example
      • Tested for a single holdings and item in Orchid BF. See links and details on the wiki page
      • FOLIO cannot do cascading modifications (prep the URL and data for AC, then create AC holdings and item, then prep the URL and data for HC. then create HC holdings and item, etc.). FOLIO does all the instance work at once, then all the holdings work, then all the item work.
      • Data for holdings and its associated item need to be in the same field, different subfields, instead of 852 for holdings data and 876 for item data.
      • For multiple URLs in the same holdings record, 3 possible options tested
        • Create 1 holdings with 6 URLs and 1 associated item with a barcode 
        • Option 1: explicit, repeated MARC field with holdings and item data for each URL
          • probably best if different e-access notes need to be associated with each URL
          • if only 1 item being created for the holdings, make sure the item data is the same in all copies of the MARC field
          • creates the largest MARC records, since all data is included in the record, in separate fields
        • Option 2: 1 MARC field per holdings, with repeated $u for each URL to be included in the holdings record
          • most efficient WRT the size of the MARC record
          • difficult to keep varying e-access notes associated with proper URLs
        • Option 3: explicit, repeated MARC field with holdings data for each URL, but item data only in the first MARC field
          • tested in case the repeated barcode in Option 1 caused item errors (which it didn't)
          • least intuitive, so least recommended
      • A-M to add modification to job profile to remove 952s at the end
    • Review Lynne/Wellesley example
    • Mostly consistent data, but might vary if enumeration/chronology for one, but not for another
      • Christie supplied an example; A-M will write up for next week
    • Jennifer: number of pieces would vary a lot; notes for circulating/special collection/staff only notes
      • Jennifer supplied an example; A-M will write up for next week
    • Autumn: use 948 for holdings and 949 for items; could they add an indicator for which holdings in a subfield of the 949; Autumn will send an example

Upcoming meetings/agenda topics:

  • 31 May:
    • WOLFcon program proposals?
    • Look at additional multiples examples
  • No meeting 7 June (EBSCO UG)
  • Misc
    • POL/VRN matching
      • For invoices, we only consider open POs
      • For Instance, Holdings, Items, we currently (Orchid and before) only consider open POs. Should we change the Inventory matching logic to allow matching on closed POs as well?
      • MODDATAIMP-769 - Getting issue details... STATUS
    • Deleting outdated versions of SRS records
      • Can we define a cutoff date? 90 days ago? 1 year ago?
        • Different for records that are used during import and then not consulted again? (e.g. EDIFACT invoices, MARC bibs that only create/update orders, holdings, items)
      • Effects on the import log
    • Data import and consortia (cross-tenant importing)
    • OCLC number cleanup
      • Confirm 035 structure, aim for it to be consistent across all FOLIO tenants
    • Downloading log info
      • Lots of interest, especially for errors
      • Including identifiers for everything
      • What would UI look like?
      • What would output look like? 
    • Variation between PTF and production library performance results - why?
    • Revisit use cases for Updating individual MARC fields - what are the most common use cases?
    • Discuss/review mockups for MARC updates refinements
    • Do we need to refine checkbox field mappings on instance, holdings, item field mapping profiles like we have on orders?
      • Would it be UI only or BE also? How much effort? A-M will make a list of the checkbox fields and get add'l background from the devs; then discuss


Chat

Colin V. (he/him)  to  Everyone 1:01 PM
hello!
2023-05-24 Data Import Subgroup meeting

Jenn Colt  to  Everyone 1:06 PM
Governance meetings are always open if you are interested in governance

Christie Thomas (she/her)  to  Everyone 1:13 PM
We could have a lab workshop session and ask for examples ahead of time to review in the session.

Christie Thomas (she/her)  to  Everyone 1:14 PM
Or maybe just sharing our data import workflows. I always learn something when hearing about how others manage data import activities locally.

Jennifer Eustis  to  Everyone 1:14 PM
I was thinking of also going over jira issues and looking at their priority

Christie Thomas (she/her)  to  Everyone 1:21 PM
Jira review is a good idea. It will help us identify gaps.

Jenn Colt  to  Everyone 1:26 PM
I would be up for JIRA review

Christie Thomas (she/her)  to  Everyone 1:32 PM
We have not seen that recently, but I am looking at our logs now.

Christie Thomas (she/her)  to  Everyone 1:33 PM
The no record eventually shows up for us and we end up with duplicate records when it is new because staff are confused and then load the record again.
wiljanen  to  Everyone 1:33 PM
I find it with both records and updated holdings

Lisa Smith, Mich State  to  Everyone 1:34 PM
The SRI slowness, getting 2 or more records, has happened to us occasionally.

Christie Thomas (she/her)  to  Everyone 1:40 PM
At Wolfcon, maybe a session on how to plan for the future given that it sounds like there is no capacity for new functionality and understanding what is considered new functionality vs a bug. I know we have a number of workflows that we still have not restarted and I just do not know how to plan for being able to implement those.

wiljanen  to  Everyone 2:06 PM
Thank You

Taylor Smith  to  Everyone 2:06 PM
thanks!