2025-05-20 Better Sample Data Meeting notes

2025-05-20 Better Sample Data Meeting notes

 Date

May 20, 2025

 Participants

  • @Lee Braginsky, @Charlotte Whitt , @Kristin Martin, @Autumn Faulkner, @Shelley Doljack @Owen Stephens , @Tod Olson

 Goals

  • Learn more about how Owen updates Agreement Data

  • Prepare for PC presentation on Thursday

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

General:

 

 

 

  1. Eureka platform to be rolled out with the Sunflower release (TBD)

Sunflower - New GA date TBD

 

ERM data (Owen)

Links from Agreements to many different documents. For cases for more extensive testing:

  • Use Postman to interact with FOLIO APIs directly

  • Set up call to the API to create an Agreement, and group together into a collection

  • Collection runner automates the calls to the collection

  • Inputs can come from CSV files, e.g., run an API call for each line in a spreadsheet, so you can work through 100 agreements to get them loaded

  • Can run JavaScript programs for each call. Use that to store information in a temporary status about what’s being loaded, so they can be linked together

  • Currently the mechanism for populating a module with data, but can’t link at the time of the loading

  • This process generates a lot of random data to test performance. The data is random. For testing particular items, usually test by hand.

  • Could we capture this as JSON? Could be challenging for linking, because UUIDs would be different each time you populate the system

  • We could make the load the JSON for the Agreements, and then search for UUIDs that are set up already. But you need to know what the UUID is going to be, or else have to look them up to get the UUID

  • Unique to Agreements: if you want to link a resource to an Agreement, you need to add the resources to the local knowledge base or synchronize with remote KB. That would need to happen first. You could shortcut it by loading direct files. Logic that controls the data is complex

  • Could we coordinate the running of these scripts for the populating of the system? DevOps runs these post-installation of the system. E.g., put it in Airflow to make it happen.

ERM desired data: 2025-04-30 ERM SIG meeting

 

  1. Planning the presentation for the PC

Plan for 15 minutes:
1. Super short intro (back ground) - 2 min

  1. Lee give refresher of the anonymization work - reached out to Community members

  2. Shelley to present the work on anonymization

  3. Finally (if time permits) - talk about work for improving data in the FOLIO reference environments - like FOLIO snapshot

    1. ERM data (https://folio-org.atlassian.net/wiki/spaces/ERMSIG/pages/955711505/ERM+Sample+Data)

    2. While we want 3 good agreements, we may instead focus on 100 random agreements to test performance and how scale impacts use

  4. PC - a review and update process (going forward)

 

Charlotte can start a slide deck and save it in our shared google folder:

 

Updates for FOLIO Snapshot

Charlotte is making progress on update of Inventory instance records (in total 36 instance records with Source ID = FOLIO). As of 4/22 we now have 20 records that have been cataloged:

https://folio-quesnelia.dev.folio.org/inventory?filters=staffSuppress.false%2Csource.FOLIO&qindex=instanceAdministrativeNotes&query=better%20sample%20data&sort=title

The 20 catalogued titles have been backed up as json files.

Add 100 MARC records

Set up data import jobs.

  • Decide to do a wiki page where we document how the FOLIO Snapshot data is built, and other relevant information for test users. @Autumn Faulkner has started a google doc to keep track on the changes we have done.

 

 

  1. Stanford is working on sample authority data to be ready soon for the snapshot environment.

  • @Autumn Faulkner - Waiting on MARC records from you, using the DI and make sure it is the right connection (hopefully SRS will do the right thing) with MARC authority.

  • @Shelley Doljack - Already uploaded the authority data in the new snapshot environment.

 

Anonymization of data in Bugfest environments:

Lee Braginsky: Good news on the Data Anonymization front: Stanford Univ is going to field a team of developers for 3 weeks. @Shelley Doljack’s is leading this effort.1st week data analysis was done. There is a spreadsheet with tables to preserve vs. Annoymize.

Focus for 2 weeks will focused on PII, Users, user custom fields, Vendor name, vendor contacts, interface credentials etc.

For anonymizing requirement: -

Shelley has put together a wiki page to gather requirements for anonymization - https://folio-org.atlassian.net/wiki/x/BQA4K .
@Lee Braginsky - to present the open question to this group before Stanford developers can start on this project.

 

Review timeline document

 

Other topics

Any items to discuss?

 Action items (updated 4/22/2025)

@Lee Braginsky will update the track for Scripts to anonymize data set.
@Lee Braginsky will publicize developers' requirements in the #folio-implementers slack channel.
@Yogesh Kumar Create a wiki page where we document how the FOLIO Snapshot data is build, and other relevant information for test users.
@Charlotte Whitt will set up a work meeting with @Autumn Faulkner @Shelley Doljack @Kristin Martin @Charlotte Whitt - to get the Data Import profiles for load of MARC data and write up the two job profiles, one for open orders and one for pending status and get order data added to the Quesnelia environment
@Charlotte Whitt will update Patron notices templates and basic functionality, and update Circ rules accordingly
@Kristin Martin - will reach out to Owen, and ask him to attend an upcoming meeting to present his script for loading of agreement data. Will review the data.
@Charlotte Whitt will update the Circ rules in the Quesnelia environment
@Charlotte Whitt - will look into adding bound-with data to the Quesnelia environment. In the 100 MARC record there is one record which is bound-with. Charlotte will ask Lehigh if we can use 5-10 more sample records from their collection

 Decisions

Lee, Yogesh, Shelley will inform the working group on the talk and progress on developing the anonymization tool. Lee, Yogesh, Shelley, and Noah meets every Monday.

We are meeting every other week - uneven weeks.