2026-02-24 Better Sample Data Meeting notes

2026-02-24 Better Sample Data Meeting notes

 Date

Feb 24, 2026

 Participants

@Lee Braginsky, @Charlotte Whitt , @Autumn Faulkner, @Shelley Doljack , @Tod Olson , @Yogesh Kumar

 Goals

  • Brainstorm next steps

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

Status updates

 

  • Any updates/changes to the Quesnalia Snapshot environment? I’m having login issues – anyone else?

  • Should it be moved to Sunflower?

    • General consensus = yes

    • Possibly use the FOLIO-FSE/folio_migration_tools to load?

    • Stanford also has scripts for extracting JSON reference data

      • Then could be loaded via API into Sunflower

  • In the past, rebuilds were not the norm for sample data environments, but this will be the case going forward

    • Can our sample environment data be preserved instead of overwritten?

  • Our reference data is slim, so moving to the newest release and from Okapi to Eureka should not present any issues (we hope!)

  • Final plan

    • Capture data as it stands now; hold off on data we’ve been preparing to load

    • Ensure the new Sunflower environment will not be overridden

    • Ensure the captured data can be loaded into the new environment

      • Ideally, capturing this in Sample Data Github module folders

    • Then, begin adding additional data as planned

 

Status of anonymization scripts

  • MSU trying once more to get an answer from Risk Management; we need concrete details to bring to the RMI office

    • 2 flavors of anonymization scripts

    • One from Stanford, in progress but not finished

      • Some local decisions in the scripts (like not obscuring circ rules)

      • EBSCO leadership reviewing legal aspects of the proposal

    • EBSCO scripts

      • Fall 2024 proof of concept (2-3 days)

      • Scripts scramble all PII across the board

      • Did some rough testing, but not deployment ready

      • Will need contributions from library developers to implement

      • Anonymized data set could be loaded to MSU Dry Run environment

 

Explore alternatives to populating Bugfest environments if dataset contributions cannot be acquired from Stanford or MSU

  • Create a contribution guide for community members (SIGs and others), including a spreadsheet for reference data like fund codes, organizations, user types, etc.

    • Would need SIG members to also bring information to and from their institutions; just SIG input would not cover every workflow

    • A broader call to the community perhaps?

  • WolfCon presentation (or earlier virtual session?) inviting contributions

    • Perhaps working sessions at WolfCon?

  • Could we build a tool into Bugfest using the API that would allow users to upload .csv data to populate the environment?

    • Next time → what would creating/maintaining involve?

 Action items

@Autumn Faulkner Snapshot Q environment → we need an overview of what data is currently in the environment, how it was loaded (SQL? via the UI?)
Determine the most important use cases to support in a Bugfest environment and come to agreement on the types of reference data and associated records to fit those use cases