Skip to end of banner
Go to start of banner

2024-11-12 Better Sample Data Meeting notes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

\uD83E\uDD45 Goals

  • Follow up on the status of discussion topics and task

\uD83D\uDDE3 Discussion topics

Time

Item

Notes

  1. Letter to Risk Office at Michigan State University Libraries

  1. Stanford is working on sample authority data to be ready soon for the snapshot environment.

Follow up on activities since last meeting:

  1. Update on - Golden copy for Bugfest environment.

  2. Update on - Work on a small data set for FOLIO Snapshot.

  3. Update on POC findings - write robust data anonymization scripts

Finalize the letter for the MSU dean.

Autumn Faulkner- We need to finalize the Google Doc with the revised version of the letter to be brought to the

Identified MARC formatted holding records; we still need an instance to go with it. We discussed the letter, Autumn has everything she needs and will forward the letter to the dean.

Alissa Hafele also contributed with Authority Controled data. See Slack conversation.

a. Golden Copy / Yogesh Kumar

Charlotte was asked by Mike Gorrell about what number of instances was intended for the Bugfest Environment / golden copy.

Mike Gorrell  5:08 PM

Thanks. Do you know how big the Bugfest dataset will be in terms of # of Bib records?

  • Chicago is ~ 10 mio instances?

  • Cornell is ~ 8.5 mio instances?

  • Stanford is ~ 12 mio instances?

  • Missouri State University ~ 6-8 mio instances?

I also told Mike G. that it's Yogesh Kumar who is responsible for the golden copy work

The environment was created by the Kitfox team and updated by QA. The Kitfox team is now working on fixing the issues listed here /wiki/spaces/DQA/pages/203685917

All work is done by now. Cleaning duplicate scripts is not easy, and teams are busy wrapping up Ramsons' features. Writing such scripts may be possible in the sunflower release. For Ramson, we will have to manually clean it, if needed.

b. Snapshot environment. Update from Charlotte Whitt: Still waiting for the environment - monthly build environment: https://folio-quesnelia.dev.folio.org/ to be built.

Update: Charlotte is still trying to get this to move. Status is the work is still Open (To do). I’m really sorry that we are stuck.

The Jira ticket for creating https://folio-quesnelia.dev.folio.org/ is FOLIO-4071. The FOLIO DevOps group is just getting reorganized, and the expectation is they will get to do this work by mid-November.

Status:

Autumn has provided sample data for music records and serials records with multiple holds and multiple items. Holdings statements to be added.

These records are loaded to this groups shared drive.

Write up Data Import Job profiles - Kristin Martin?

Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up). MM-SIG eyes on the 100 bibs. That these records has the right mix of misc. types, to cover the basic; incl. bound-with.

Stanford has MFHD-formed holdings data in the current export workflow. Alissa Hafele mentioned that Stanford could probably provide these data. Maybe ~25-50 examples—In progress.

Then later we can move on to Order/Order lines data. We will ask the Acquisition SIG (Kristin Martin will take the lead on this), the ERM-SIG to review the entered data - waiting for initial env. and data to be added.

c. Anonymization script

Update: from Lee Braginsky

Right now, we have a POC. Lee is requesting that 2 developers from the community help right a Java/SQL and folio schema tool to anonymize the data for 2 sprints.

Tod suggests publicizing developers' requirements in the #folio-implementers slack channel.

When Stanford is ready to deliver data, Yogesh and Lee can set up a meeting. Alissa will check with her colleagues.

Lee Braginsky Can we make the anonymized POC slide deck available in the WG folders? Is the POC code in Github? Yogesh to follow up.

  1. Review timeline document

  • Draft - timeline document -

Nothing new to report

  1. Other topics

?

✅ Action items

  • Yogesh Kumar will update the Golden copy (Bugfest) track in the Timeline doc
  • Charlotte Whitt will update the FOLIO Snapshot track in the Timeline doc
  • Lee Braginsky will update the track for Scripts to anonymize data set.
  • Autumn Faulkner- We need to finalize the Google Doc with the revised version of the letter to be brought to the

⤴ Decisions

  • No labels