Skip to end of banner
Go to start of banner

2024-09-03 Better Sample Data Meeting notes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

\uD83E\uDD45 Goals

  • Follow up on status on discussion topics and task

\uD83D\uDDE3 Discussion topics

Time

Item

Notes

  1. Split the project in three:

    1. Golden copy for Bugfest environment.

    2. Work on a small data set for FOLIO Snapshot.

    3. write robust data anonymization scripts

Follow up on activities since last meeting:

Plan - who will take on what task:

a. Golden Copy / Yogesh Kumar

uChicago can provide data for inventory, but not sensitive data as circulation, users, acquisition etc.

Michigan State: some hesitation at management level.

A conversation also came up in the ARLEF group where a couple of large institutions said they could maybe provide data.

The QA team is working on fixing the current Bugfest Data set (September).

Focus work on b. and c. - while waiting for approval of data for the golden copy.

b. Snapshot environment / Charlotte Whitt

Charlotte has reached out to ID Devops re. adding data to monthly build environment: https://folio-quesnelia.dev.folio.org/ (the link is not up and running yet)

If this is to take too long, Charlotte will look into other options.

Recruit members from MM-SIG

Write up Data Import Job profiles

Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up).

Charlotte will create a reasonably file with anonymized MARC records.

Load MARC Authority data

Charlotte to fix the 36 instances with Instance source = FOLIO

Then later we can move on to Order/Order lines data. We will ask the MM-SIG, the Acquisition SIG, the ERM-SIG to review the entered data.

c. Status on writing robust data anonymization scripts / Lee Braginsky

  • Replace PII with randomly generated data

  • Scramble loan history

  • Scramble orders, invoice amounts, fund codes

  • Replace vendor names with randomized names

  • Strip out staff notes with initials, etc.

  • One set of data for the general environment, and perhaps a second sample set for the ECS environment

    • Get this from a consortia!

  1. Review timeline document

  • Draft - timeline document -

We will split it up the the three tracks as listed above.

Yogesh will update the Golden copy (Bugfest) section.

Charlotte will update the FOLIO Snapshot section

Lee will update the time line doc for his work on anonymization scripts.

✅ Action items

  • Charlotte Whitt to talk with Index Data’s developers to get a Quesnelia reference environment similar as we have https://folio-orchid.dev.folio.org/ - this environment should have all re-build paused until we have the data set captured
  • Yogesh Kumar will update the Golden copy (Bugfest) track in the Timeline doc
  • Charlotte Whitt will update the FOLIO Snapshot track in the Timeline doc
  • Lee Braginsky will update the track for Scripts to anonymize data set.

⤴ Decisions

  • No labels