2025-01-07 Better Sample Data Meeting notes

 Date

Jan 7, 2025

 Participants

  • @Yogesh Kumar(regrets), @Lee Braginsky, @Charlotte Whitt , @Kristin Martin, @Autumn Faulkner , @Tod Olson @Shelley Doljack

 Goals

  • Follow up on the status of discussion topics and task

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

General:

  1. Letter to Risk Office at Michigan State University Libraries

 

  1. Eureka platform to be rolled out with the Sunflower release (3/31/2025)

 

 

 

 

 

 

FOLIO Snapshot:

  1. Environment is ready

 

 

  1. Stanford is working on sample authority data to be ready soon for the snapshot environment.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Anonymization of data in Bugfest environments:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Golden Copy:

 

 

 

 

 

 

 

 

 

 

 

 

 

@Autumn Faulkner did send the finalized letter to the dean for the MSU early December.

Any updates ? No news yet.

 

Document written up by the TC: Draft report on adoption and timeline of the Eureka platform

Does this cause any changes to what this working group is working on?

Tod mentions that no official decision on when to roll out Eureka has be taken yet. There is a meeting in the Tri-Council on 1/13/2025

 

 

  • Finally the Quesnelia environment https://folio-quesnelia.dev.folio.org/ is up and running. The Wiki landing page is updated.

    • Charlotte will work on Inventory data.

      • Update existing 36 FOLIO Source

      • Add 100 MARC records - MM-SIG eyes on the 100 bibs What to ensure that these records has the right mix of misc. types, to cover the basic; incl. bound-with.

      • Instances to have holdings, or holdings/item. Also example on items with No barcode, On order, and regular item barcodes.

    • Shelley to add Authority data. Try to overlap with inventory records both source FOLIO? and source MARC

    • Autumn has provided sample data for music records and serials records with multiple holds and multiple items. Holdings statements to be added.

      These records are loaded to this groups shared drive.

    • Autumn: Write up Data Import Job profiles. Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up).

    • Stanford has MFHD-formed holdings data in the current export workflow. Alissa Hafele mentioned that Stanford could probably provide these data. Maybe ~25-50 examples—In progress. Any update? Alissa: I have put data in the drive in a new folder - 30 instances with corresponding MARC Holdings records and FOLIO Holdings records (can be identified by lack of $s in 999). Records still include Stanford specific locations. Also included is a link to spreadsheet with reasons for picking the records. Just let me know if there are any questions or if we need to tweak the export format!

    • Then later we can move on to Order/Order lines data. We will ask the Acquisition SIG (Kristin Martin will take the lead on this). Kristin will await until the system has Data Import job profiles for creation orders.

    • ERM-SIG to help with providing data and review the entered data - waiting for initial env. and data to be added.

  • Decide to do a wiki page where we document how the FOLIO Snapshot data is build, and other relevant information for test users.

 

Does anonymizing matter for Inventory records

  • E.g. notes (drop administrative notes, and all notes with private information - this need to be spec’ed out in our documentation),

  • local tags, digital book plate information, donor information (can be dropped)

    • This working group should provide the guidelines for this. Shelley asks - where the tool need to be flexible.

  • donor or bookplate data in MARC tags? Or other inventory record fields?

    • yes

  • any donor info at all?

Will come back to the following topics at next meeting:

  • are circulation rules supposed to be anonymized? We will come back to that topic next time.

    • In FOLIO Snapshot the circ rules should be minimal

  • FOLIO Snapshot is being build every 24 hours. So the sample data must adhere to that, when adding talking about loan patterns, how do we ensure that open loans adhere to the circ rules for testing things like bills, notices, aging to lost, etc.?

 

Lee’s group will be tied up in Ramsons and Sunflower work.

Lee suggests to focus on one area; e.g. Patron and usergroups data. Can use a tool to fake data (names, phone numbers, addresses - all PII data).

Shelley is working on this in Phython. Shelley has put together a wiki page to gather requirements for anonymization - https://folio-org.atlassian.net/wiki/x/BQA4K . Shelley has pulled out all reference data. Tod mentioned that maybe Chicago could contribute with a Phyton developer too. Will come back to this in the new year.

Shelley would need to have the technical requirements written up. Will start with the document provided by Lee and his project on the POC.

 

Golden copy has ~8-9 Million instances.

Contributing institutions will need to stand up a second test environment.

Shelley asked about the tenant, and the tenant IDs? Is the reference environment to be a multi tenant environment - Chicago, Stanford, MSU. The current Bugfest environment is a single tenant environment.

Tod pointed out that data from all three institutions would cause inconsistency in the use of reference data. Tod thinks specifically on locations; but also the use of item material types.

Shelley asked if multi tenant environment would mean that each institutions had their own reference data.

Will a solution be to have multiple stand alone environments (A, B, and C)? Yogesh confirmed. Merging data would be phase 2.

 

Review timeline document

  • Draft - timeline document - Now work can get started.

 

Other topics

Lee will be out on vacation - starting as of 1/9/2025. Back on 1/17/2025.

Autumn will be absent next Tuesday too.

Next meeting will be 1/21/2025 at noon (EDT) 6:00 pm CET.

 

 Action items

@Charlotte Whitt is on the heels of the DevOps to get the work for the Q environment build: https://folio-quesnelia.dev.folio.org/ (right now this link does not work). The ticket in Jira is: FOLIO-4071 Create Quesnelia reference environment. Latest comments is from 12/10/2024
@Yogesh Kumar will update the Golden copy (Bugfest) track in the Timeline doc
@Charlotte Whitt will update the FOLIO Snapshot track in the Timeline doc
@Lee Braginsky will update the track for Scripts to anonymize data set.
@Autumn Faulkner- We need to finalize the Google Doc with the revised version of the letter to be brought to the
@Lee Braginsky will publicize developers' requirements in the #folio-implementers slack channel.
@Charlotte Whitt will add to the agenda for next time 1/7/2025 the talk on anonymization of circulation rules and loan data

 Decisions

Lee, Yogesh, Shelley will inform the working group on the talk and progress on developing the anonymization tool. Lee, Yogesh, Shelley, and Noah meets every Monday.