2024-08-27 Better Sample Data Meeting notes

 Date

Aug 27, 2024

 Participants

  • @Yogesh Kumar @Charlotte Whitt @Kristin Martin @Autumn Faulkner

 Goals

  • Follow up on status on discussion topics and task

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

 

  1. Proposal - split the project in two:

    1. Golden copy for Bugfest environment.

    2. Work on a small data set for FOLIO Snapshot.

 

The reason for doing split of the work into three tracks is that right now we are in a waiting position in order to get the legal talks/agreements and the library dean's approval of providing data set for the Bugfest Golden copy.

The proposal is therefor to do three parallel tracks of activities:
a) Golden copy for Bugfest environment. Yogesh Kumar is taking the lead on this.

The QA team is working on fixing the current Bugfest Data set (September).

b) FOLIO Snapshot. Work on updating/creating a small data set for FOLIO Snapshot. Charlotte Whitt is taking the lead on this. Will update data in the Quesnelia reference environment which is not being overwritten every 24 hour.

Start work on Inventory. Recruit people from MM-SIG. Write up Data Import profile, build instance, holdings, item. Also build MARC authority.

Charlotte can fix the 36 instances with Instance source = FOLIO

Kristin Martin will reach out to Christie Thoms, setting up a Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up). Charlotte will create a reasonably file with anonymized MARC records.

Then later we can move on to Order/Order lines data.

We will ask the MM-SIG, the Acquisition SIG, the ERM-SIG to review the entered data.

c) Lee Braginsky is working with developers to write robust data anonymization scripts.

  • Replace PII with randomly generated data

  • Scramble loan history

  • Scramble orders, invoice amounts, fund codes

  • Replace vendor names with randomized names

  • Strip out staff notes with initials, etc.

  • One set of data for the general environment, and perhaps a second sample set for the ECS environment

    • Get this from a consortia!

 

  1. Review timeline document

  • Draft - timeline document -

We will split it up the the three tracks as listed above.

Yogesh will update the Golden copy (Bugfest) section.

Charlotte will update the FOLIO Snapshot section

Lee will update the time line doc for his work on anonymization scripts.

 

  1. Getting source sample data sets

Autumn Faulkner: There is some hesitation at management level. MSU (Michigan State) has reached out to uChicago - a talk on dean level.

  1. originally uChicago only shared the bibliographic data for BugFest environment.

  2. uChicago will want to anonymize data before sending it to EBSCO.

 Action items

@Charlotte Whitt to talk with Index Data’s developers to get a Quesnelia reference environment similar as we have https://folio-orchid.dev.folio.org/ - this environment should have all re-build paused until we have the data set captured
@Yogesh Kumar will update the Golden copy (Bugfest) track in the Timeline doc
@Charlotte Whitt will update the FOLIO Snapshot track in the Timeline doc
@Lee Braginsky will update the track for Scripts to anonymize data set.

 Decisions