Skip to end of banner
Go to start of banner

2024-10-01 Better Sample Data Meeting notes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

\uD83E\uDD45 Goals

  • Follow up on the status on discussion topics and task

\uD83D\uDDE3 Discussion topics

Time

Item

Notes

  1. Letter to Risk Office at Michigan State University Libraries

  1. Follow up on activities since last meeting:

    1. Update on - Golden copy for Bugfest environment.

    2. Update on - Work on a small data set for FOLIO Snapshot.

    3. Update on POC findings - write robust data anonymization scripts

Continue talk/input on Autumn Faulkner's letter to Risk Office at Michigan State University Libraries.

  • Dean would like letter to originate from FOLIO community/this group/Product Council – whatever makes the most sense. Is there FOLIO letterhead?

  • A couple other questions from my Dean re: technical processes.

Follow up on activities since last meeting:

WOLFcon 2024: Yogesh and Charlotte have talked with Stanford University, who are interested in providing data to the Golden copy, and also join our group.

Plan - who will take on what task:

a. Golden Copy / Yogesh Kumar

Environment creation is in progress, likely to complete this week (9/20)

The QA team will fix data according to the test data feedback wiki.

Link to wiki - /wiki/spaces/DQA/pages/203685917

Has gotten some feed back from POs, but not enough.

Jira tickets has been written up for KitFox team - Sprint 199. Jira ticket: BF-752 (Epic).

Will then be what we will use as copy for Bugfest Ramsons. Expect to have this done by end of September.

b. Snapshot environment / Charlotte Whitt

Charlotte has reached out to ID Devops re. adding data to monthly build environment: https://folio-quesnelia.dev.folio.org/ (the link is not up and running yet)

Index Data’s DEVOPS are working on this as we speak:

FOLIO-4071 Create Quesnelia reference environment

  • The environment will be in Quesnelia

  • Data will not be overwritten.

  • Charlotte will provide access to Autumn, Kristin and SMEs helping out.

Charlotte to fix the 36 instances with Instance source = FOLIO - and will do this as soon as the environment is ready.

Recruit members from MM-SIG - extra eyes, review the records updated in FOLIO Instance = FOLIO.

Autumn has a music cataloger back ground. Will find records in OCLC. Will aim for next week.

Also include Serial records, with multiple holdings each with multiple items. Add holdings statements.

Write up Data Import Job profiles

Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up). MM-SIG eyes on the 100 bibs. That these records has the right mix of misc. types, to cover the basic; incl. bound-with.

/wiki/spaces/DQA/pages/203685917

Will ask if MM-SIG member has experience with MFHD.

Load MARC Authority data - this task we will come back to after WOLFcon.

Then later we can move on to Order/Order lines data. We will ask the Acquisition SIG (Kritin Martin will take the lead on this), the ERM-SIG to review the entered data.

c.

Update: from Lee Braginsky

We have successfully completed the POC on dataset anonymization. I can present results at the next meeting. Next steps - TBD

Status on writing robust data anonymization scripts / Lee Braginsky. Did get a university to provide data, which will be anonymized/scrambled

  • Replace PII with randomly generated data

  • Scramble loan history

  • Scramble orders, invoice amounts, fund codes

  • Replace vendor names with randomized names

  • Strip out staff notes with initials, etc.

  • One set of data for the general environment, and perhaps a second sample set for the ECS environment

    • Get this from a consortium!

  1. Review timeline document

  • Draft - timeline document -

We will split it up the the three tracks as listed above.

Yogesh will update the Golden copy (Bugfest) section.

Charlotte will update the FOLIO Snapshot section

Lee will update the time line doc for his work on anonymization scripts.

✅ Action items

  • Charlotte Whitt to talk with Index Data’s developers to get a Quesnelia reference environment similar as we have https://folio-orchid.dev.folio.org/ - this environment should have all re-build paused until we have the data set captured
    • Quick update build of the environment: https://folio-quesnelia.dev.folio.org/ (right now this link does not work). Then Index Data’s DevOps will spin up this environment, hopefully in October. Charlotte is working on that the environment can stay persistent while we gather our data.
  • Yogesh Kumar will update the Golden copy (Bugfest) track in the Timeline doc
  • Charlotte Whitt will update the FOLIO Snapshot track in the Timeline doc
  • Lee Braginsky will update the track for Scripts to anonymize data set.

⤴ Decisions

  • No labels