2025-08-05 Better Sample Data Meeting notes

2025-08-05 Better Sample Data Meeting notes

 Date

Aug 5, 2025

 Participants

@Lee Braginsky (regrets), @Charlotte Whitt , @Yogesh Kumar (regrets) @Kristin Martin (regrets), @Autumn Faulkner, @Shelley Doljack , @Tod Olson (regrets)

 Goals

  • Organization of the working groups work going forward

  •  

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

General:

 

Organizing our work going forward?

  • Charlotte must step down as co-convener of the group (too much work right now). Can continue help getting data into the reference environments - Reference environments

  • split the working group in two? FOLIO reference env. and Bugfest sample data/anonymization?

    • Shelly the Anonymization is currently on hold

    • Stanford, Shelly, to get agreement on sharing the anonomized data agreement work has taken more time than anticipated. Anonymization of the data has also been very time consuming.

 

Sample data loads into the Quesnelia environment

  • Job profile for print order/instance/holdings/item working; sample records loaded

  • Will start load data task by task; and loop in the respective POs + Jason Root, on the data we have ready by now. This task by task approach might be a good realistic way to do this work forward.

  • @Autumn Faulkner suggests that we have a more formalized work with the different SIGs and maybe get some volunteers to help out; e.g. data for Order data, loan types in the item records. @Autumn Faulkner will reach out to the respektive SIGs.

  • @Autumn Faulkner will compare the reference data in Quesnelia, FOLIO Snapshot, and a realistic environment to add new sample data.

  • Some questions from Autumn:

    •  

  •  

 

ERM data (Owen) - what do we want to do here?

Links from Agreements to many different documents. For cases for more extensive testing:

  • Use Postman to interact with FOLIO APIs directly

  • Set up call to the API to create an Agreement, and group together into a collection

  • Collection runner automates the calls to the collection

  • Inputs can come from CSV files, e.g., run an API call for each line in a spreadsheet, so you can work through 100 agreements to get them loaded

  • Can run JavaScript programs for each call. Use that to store information in a temporary status about what’s being loaded, so they can be linked together

  • Currently the mechanism for populating a module with data, but can’t link at the time of the loading

  • This process generates a lot of random data to test performance. The data is random. For testing particular items, usually test by hand.

  • Could we capture this as JSON? Could be challenging for linking, because UUIDs would be different each time you populate the system

  • We could make the load the JSON for the Agreements, and then search for UUIDs that are set up already. But you need to know what the UUID is going to be, or else have to look them up to get the UUID

  • Unique to Agreements: if you want to link a resource to an Agreement, you need to add the resources to the local knowledge base or synchronize with remote KB. That would need to happen first. You could shortcut it by loading direct files. Logic that controls the data is complex

  • Could we coordinate the running of these scripts for the populating of the system? DevOps runs these post-installation of the system. E.g., put it in Airflow to make it happen.

ERM desired data: 2025-04-30 ERM SIG meeting

 

  1. Stanford is working on sample authority data to be ready soon for the snapshot environment.

 

Anonymization of data in Bugfest environments:

 

 

Review timeline document

 

Other topics

Any items to discuss?

 Action items (updated 8/5/2025)

@Lee Braginsky will update the track for Scripts to anonymize data set.
@Autumn Faulkner will make a systematic comparison of reference data in the MSUL FOLIO environment and add missing components to the Quesnelia environment; will also develop a list of reference data points which need some input from SIGs (i.e., setting up budgets, fiscal years, assigning funds, user groups and rules, etc.)
@Lee Braginsky will publicize developers' requirements in the #folio-implementers slack channel.
@Yogesh Kumar Create a wiki page where we document how the FOLIO Snapshot data is build, and other relevant information for test users.
@Autumn Falkner has begun documentation on reference data Sample reference environments. @Shelley Doljack will add more information
@Charlotte Whitt will set up a work meeting with @Autumn Faulkner @Shelley Doljack @Kristin Martin @Charlotte Whitt - to get the Data Import profiles for load of MARC data and write up the two job profiles, one for open orders and one for pending status and get order data added to the Quesnelia environment
@Charlotte Whitt will update Patron notices templates and basic functionality, and update Circ rules accordingly
@Kristin Martin - will reach out to Owen, and ask him to attend an upcoming meeting to present his script for loading of agreement data. Will review the data.
@Charlotte Whitt will update the Circ rules in the Quesnelia environment
@Charlotte Whitt - will look into adding bound-with data to the Quesnelia environment. In the 100 MARC record there is one record which is bound-with. Charlotte will ask Lehigh if we can use 5-10 more sample records from their collection
@Charlotte Whitt and @Shelley Doljack - will work on getting the instance records updated in GitHub - https://github.com/folio-org/mod-inventory-storage/blob/master/sample-data/instances/aba.json (8/4/2025)

 Decisions

We are meeting every other week - uneven weeks - next meeting is: 8?19/2025

We will revisit the talk about how to organize our work going forward

We will be working on smaller task, and move forward on the work we have collected by now

Autumn will reach out to the SIGs and get feed back on reference data in e.g. the Orders app, and more