2024-07-30 Better Sample Data Meeting notes

\uD83D\uDDD3 Date

30 Jul 2024

Time

Item

Presenter

Notes

Revisiting action items from 7/15/2024
- Spreadsheet of reference data
- Google Form for SIG responses to collect both sample record data and sample reference data types
- Check about effort needed to get a fresh export from Chicago and documentation on the process (TO)
- Regular meeting times
Identify
1. types of data used in each of the apps as well as data at the tenant level - move the table to spreadsheet
2. Reference data by app
  1. Loan types, fund codes, call number types, etc.
3. Record types by app
  1. Orders, order lines, users, instance/holdings/items, etc.
4. App settings
  1. Data Import job profiles, Inventory export targets, etc.
5. Tenant-wide data & settings
  1. Libraries, location codes, service points
  2. Consortial partners & relationships
  3. Permission sets
6. Insert links to GitHub repositories – CW - is this still relevant
7. Also solicit data samples from respective libraries - e.g.
  1. Order data as mentioned by Maccabee Levine (Lehigh)
  2. Bound-with (one item linking to multiple holdings (GBV)

Documents:

Recap of where we are:

Plan is to set up a blank environment
- Set up with a set of generic reference data, hopefully using a copy of a university’s data
Ask SMEs and users to upload sample data that they need for testing
Take a snapshot and use as a golden copy
Will need to ensure ongoing maintenance of this environment as features and apps are built out and require new sample data

Where to source sample data

Chicago’s data set uses a customized MARC mapping rather than the default; Chicago is also not using MARC authority data
We need a library using ERM, MARC authorities, and default MARC mapping
Robust anonymization will be required. Lee’s plan:
- Replace PII with randomly generated data
- Scramble loan history
- Scramble orders, invoice amounts, fund codes
- Replace vendor names with randomized names
- Strip out staff notes with initials, etc.
One set of data for the general environment, and perhaps a second sample set for the ECS environment
- Get this from a consortia!

Action items

Spreadsheet has been populated with all modules and their related SIGs
1. This is what we will use to compile and deliver the final dataset to devs
Form has been drafted to solicit input from SIGs, SMEs, POs, etc.

5 min

Future meeting times

Every Tuesday at 6pm CET