2024-08-06 Better Sample Data Meeting notes

 Date

Aug 6, 2024

 Participants

  • @Yogesh Kumar @Charlotte Whitt @Lee Braginsky @Kristin Martin

 Goals

  • Follow up on status on discussion topics and task

 Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

  1. Revisiting action items from 7/30/2024

    Spreadsheet has been populated with all modules and their related SIGs

    1. This is what we will use to compile and deliver the final dataset to devs

  2. Form has been drafted to solicit input from SIGs, SMEs, POs, etc.

    1. Yogesh and Lee will review and let Autumn know about correx

  3. Getting sample data sets

    1. Autumn and Tod will bring write-up of Lee’s proposal to administrations, check on feasibility and willingness to use Chicago and MSU data sets

  4. Sketch a tentative timeline for this work

 

Documents:

  • High Quality Data in Reference Environments (original)

  • Sample Data Needed for Snapshot

  • Tracking spreadsheet: https://docs.google.com/spreadsheets/d/11wdDG0JDiglWROQPChCYHkD6jrpm1HXJz2tCzH_ZsyI/edit?gid=0#gid=0

  • Draft - timeline document -

Recap of where we are:

  • Plan is to set up a blank environment

    • Set up with a generic data set, hopefully using a copy of a university’s production environment

  • Ask SMEs and users to upload sample data that they need for testing

  • Take a snapshot and use as a golden copy

  • Will need to ensure ongoing maintenance of this environment as features and apps are built out and require new sample data

Where to source sample data

  • Chicago’s data set uses a customized MARC mapping rather than the default; Chicago is also not using MARC authority data

  • We need a library using ERM, MARC authorities, and default MARC mapping

  • Robust anonymization will be required. Lee’s plan:

    • Replace PII with randomly generated data

    • Scramble loan history

    • Scramble orders, invoice amounts, fund codes

    • Replace vendor names with randomized names

    • Strip out staff notes with initials, etc.

  • One set of data for the general environment, and perhaps a second sample set for the ECS environment

    • Get this from a consortia!

7/30 action items

  1. Spreadsheet has been populated with all modules and their related SIGs

    1. This is what we will use to compile and deliver the final dataset to devs

  2. Form has been drafted to solicit input from SIGs, SMEs, POs, etc.

    1. Yogesh and Lee will review and let Autumn know about correx

  3. Getting sample data sets

    1. Autumn and Tod will bring write-up of Lee’s proposal to administrations, check on feasibility and willingness to use Chicago and MSU data sets

 

 

 Action items

 Decisions