2025-09-02 Better Sample Data Meeting notes

2025-09-02 Better Sample Data Meeting notes

 Date

Sep 2, 2025

 Participants

@Lee Braginsky, @Charlotte Whitt , @Yogesh Kumar @Kristin Martin, @Autumn Faulkner(regret), @Shelley Doljack , @Tod Olson

 Goals

  • Organization of the working groups going forward.

  • Update on the general status of all tracks

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

General:

 

Organizing our work going forward?

  • @Autumn Faulkner - It would be great if you can be co-convener. We can also look for a new time slot for this meeting that works for everyone.

  • Can continue help getting data into the reference environments - Reference environments

  • We have a pool of sample data that needs to work together, orders fitting into inventory, authorities data working with MARC etc.

  • Is it possible to generate reference data (fund codes, vendor names, locations, loan rules, etc) from .csv files with lists of values

    10:35 - Existing files need to be cleaned up and enhanced with new data properties of the latest FOLIO features.

    Stanford has scripts to generate this data

 

Sample data loads into the Quesnelia environment

  • @Charlotte Whitt - suggested to invite Jason Root in the next meeting to educate us on what form the data is needed to deploy in our environment.

  • Job profile for print order/instance/holdings/item working; sample records loaded

  • Will start load data task by task; and loop in the respective POs + Jason Root, on the data we have ready by now. This task by task approach might be a good realistic way to do this work forward.

  • @Autumn Faulkner suggests that we have a more formalized work with the different SIGs and maybe get some volunteers to help out; e.g. data for Order data, loan types in the item records. @Autumn Faulkner will reach out to the respektive SIGs.

  • @Autumn Faulkner will compare the reference data in Quesnelia, FOLIO Snapshot, and a realistic environment to add new sample data.

 

ERM data (Owen) - what do we want to do here?

Let’s postpone the ERM data until we've finalized the sample order and vendor data.

Postpone: Serials and Receiving data for now.


Links from Agreements to many different documents. For cases for more extensive testing:

  • Use Postman to interact with FOLIO APIs directly

  • Set up call to the API to create an Agreement, and group together into a collection

  • Collection runner automates the calls to the collection

  • Inputs can come from CSV files, e.g., run an API call for each line in a spreadsheet, so you can work through 100 agreements to get them loaded

  • Can run JavaScript programs for each call. Use that to store information in a temporary status about what’s being loaded, so they can be linked together

  • Currently the mechanism for populating a module with data, but can’t link at the time of the loading

  • This process generates a lot of random data to test performance. The data is random. For testing particular items, usually test by hand.

  • Could we capture this as JSON? Could be challenging for linking, because UUIDs would be different each time you populate the system

  • We could make the load the JSON for the Agreements, and then search for UUIDs that are set up already. But you need to know what the UUID is going to be, or else have to look them up to get the UUID

  • Unique to Agreements: if you want to link a resource to an Agreement, you need to add the resources to the local knowledge base or synchronize with remote KB. That would need to happen first. You could shortcut it by loading direct files. Logic that controls the data is complex

  • Could we coordinate the running of these scripts for the populating of the system? DevOps runs these post-installation of the system. E.g., put it in Airflow to make it happen.

ERM desired data: 2025-04-30 ERM SIG meeting

 

  1. Stanford is working on sample authority data to be ready soon for the snapshot environment.

 

Anonymization of data in Bugfest environments:

On hold due to other priorities.
We need to sign an NDA with EBSCO/Stanford to use a partial anonymized dataset. @Lee Braginsky- Will follow up on how it was done in the past.

 

Review timeline document

 

Other topics

Any items to discuss?

 Action items (updated 9/2/2025)

@Lee Braginsky will update the track for Scripts to anonymize data set.
@Autumn Faulkner will make a systematic comparison of reference data in the MSUL FOLIO environment and add missing components to the Quesnelia environment; will also develop a list of reference data points which need some input from SIGs (i.e., setting up budgets, fiscal years, assigning funds, user groups and rules, etc.)
@Lee Braginsky will publicize developers' requirements in the #folio-implementers slack channel.
@Yogesh Kumar @Charlotte Whitt - Create a wiki page where we document how the FOLIO Snapshot data is build, and other relevant information for test users.
@Autumn Falkner has begun documentation on reference data Sample reference environments. @Shelley Doljack will add more information
@Charlotte Whitt will set up a work meeting with @Autumn Faulkner @Shelley Doljack @Kristin Martin @Charlotte Whitt - to get the Data Import profiles for load of MARC data and write up the two job profiles, one for open orders and one for pending status and get order data added to the Quesnelia environment
@Charlotte Whitt will update Patron notices templates and basic functionality, and update Circ rules accordingly
@Kristin Martin - will reach out to Owen, and ask him to attend an upcoming meeting to present his script for loading of agreement data. Will review the data.
@Charlotte Whitt will update the Circ rules in the Quesnelia environment
@Charlotte Whitt - will look into adding bound-with data to the Quesnelia environment. In the 100 MARC record there is one record which is bound-with. Charlotte will ask Lehigh if we can use 5-10 more sample records from their collection
@Charlotte Whitt and @Shelley Doljack - will work on getting the instance records updated in GitHub - https://github.com/folio-org/mod-inventory-storage/blob/master/sample-data/instances/aba.json (8/4/2025)

 Decisions

We will revisit the talk about how to organize our work going forward

We will be working on smaller task, and move forward on the work we have collected by now

Autumn will reach out to the SIGs and get feed back on reference data in e.g. the Orders app, and more