2022-01-05 Data Import Subgroup meeting

 Recordings are posted Here

Slack channel for Q&A, discussion between meetings

Additional discussion topics in Subgroup parking lot

Attendees: Ann-Marie Breaux (Deactivated) Jennifer Eustis Timothy Watters Lisa McColl leeda.adkins@duke.edu Lloyd Chittenden Monica Arnold Nick Cappadona Jenn Colt Christie Thomas 

Special guest: Kimie Kester , FOLIO Designer

Lotus

Agenda topics: 

  • DI broken on hosted ref envs; merged some major changes; hope to have it sorted out by tomorrow; for now, Kiwi Bugfest is best alternate option
  • Kiwi Bug: Identifier matching should match on Identifier and Identifier type
  • Bug: https://folio-org.atlassian.net/browse/UIDATIMP-1067:
    • If matches can be nested below actions, would there ever be a need to use the same match with different submatches/actions more than once in the same job profile? Not sure
    • Is the proposed scenario going to work? (if matches can be nested under actions) - probably yes
    • Any job profiles where you'll need the same match with different match/actions underneath? Not sure
    • Decision: Test after available on hosted ref envs; and then decide if Kiwi HF is desired, and/or if job profile needs to be able to support same match with different matches/actions underneath it
  • Data Import Log Enhancements
    • Will try to surface messages for 
      • Further processing stopped because there was a match/no-match to 1 hit, and then no further action specified in the job profile
      • Further processing stopped because there the match resulted in multiple hits, so no further action taken
      • A-M to talk with devs and add an issue; probably Morning Glory
    • Discussion topic: Is there value in purging logs after X days?
      • Could we start with a pre-defined value for X that is the same for all FOLIO libraries? If so, what is that value?
      • Christie: Purge logs for certain types of jobs, but not for all jobs; if it's a large import, may not find the weird data until months later; use the logs to see what the process was and who ran it and other possible affected records and what external tools were used to process the records. System person reviews on a regular basis and manually deletes based on date and job type criteria
      • Leeda: +1 to Christie; can see the log detail for up to 4 months of jobs; beyond that you can see the summary (equivalent of the 1 line summary), but not the details for each record; would be helpful 
      • Jennifer E: purge every 90 days
      • Lisa M: +1 to Christie
      • Possible purge filters
        • Single record imports vs Regular imports
        • Date
        • Job profile
        • Scheduler
        • Tags (a certain job profile?)
      • If logs are purged, would something else be needed? Either at point of import or point of purge?
      • Nick: If landing page is more performant, and the filters on the View all page are more performant, then wait and see how much that helps before making any changes
        • We would have an opportunity to test pre-purge in Lotus bugfest and possibly PTF env (not Rancher or hosted ref envs)
        • A-M doublecheck with developers: When Landing page becomes more performant, will View all page also?
        • A-M ask devs: Is there a Jira issue that allows for populating fake import jobs, so that an environment can be populated with lots of imports quickly?
      • Jenn: In Cornell's live env, EBSCO purged all up to 1 Nov, and then up to 1 Dec; has saved 20-30 seconds but still taking several minutes to load
      • Is there a need to export log info from FOLIO?
        • If exports just what is seen in line by line, probably not
        • If more could be added, then yes
          • If log included summary, HRIDs, errors - then it could be useful
      • Having the summary in FOLIO more important than being able to export
        • Number created vs Number updated (HRIDs/UUIDs)
        • Matchpoint, if there was one, especially if it was discarded
        • What were the errors (and on which records), especially for large jobs
        • What was discarded and why (e.g. if too many matches, or if duplicated, which record did it dup)
        • Being able to filter
      • JSON details are still important, especially for errors
      • Next step: A-M meet with Kimie to create fresh summary log screens for the group
    • Priorities
      • Some libraries using external tools for MARC orders and MARC invoices, but not all
      • Getting logs in better shape is higher priority than Orders/Invoices
    • Sample logs: 
    • Christie:
      • Number of bib records read:       000000708
      • Number of ADM records created:    000000708
      • Number of HOL records created:    000000708
      • Number of item records created:   000000708
      • Wrote Z100 record with rec-key: 000864141
      • Load: /exlibris/aleph/u22_1/alephe/tab/tab100
      • Load: /exlibris/aleph/u22_1/fcl01/tab/tab100
      • Load: /tmp/utf_files/exlibris/aleph/u22_1/fcl01/tab/pc_tab_exp_field.eng
      • Load: /tmp/utf_files/exlibris/aleph/u22_1/fcl01/tab/pc_tab_exp_field_extended.eng
      • Info: Empty table /tmp/utf_files/exlibris/aleph/u22_1/fcl01/tab/tab_alert
    • Jenn:
      • BIBLIOGRAPHIC, MFHD, ITEM or AUTHORITY
      • Records Processed:     8
        • Added:         8
        • Discarded:     0
        • Rejected:      0
        • Errored:       0
        • Replaced:      0
        • Merged:        0
        • Deleted:       0
        • Mfhds created: 8
        • Items created: 0
      • Purchase Order Number: 282773
        • Number of PO Created:   1
        • Number of PO Approved:  0
        • Number of Line Item Processed:  8
        • Number of Line Item Rejected:   0
        • Number of Line Item Inserted:   8
      • Mon Dec 14 08:50:56 2020