/
2025-04-22 Better Sample Data Meeting notes

2025-04-22 Better Sample Data Meeting notes

 Date

Apr 22, 2025

 Participants

  • @Yogesh Kumar, @Lee Braginsky, @Charlotte Whitt , @Kristin Martin, @Autumn Faulkner(regrets), @Tod Olson, @Shelley Doljack

 Goals

  • Follow up on the status of discussion topics and task

 Discussion topics

Time

Item

Notes

Time

Item

Notes

 

General:

  1. Letter to Risk Office at Michigan State University Libraries

@Autumn Faulkner did send the finalized letter to the dean for the MSU early December.

The letter was sent to the risk management group, no response yet. Still nothing. The changes to Federal governments - have kept everyone busy.

 

  1. Eureka platform to be rolled out with the Sunflower release (TBD)

Sunflower will be delayed. Non-ECS Bugfest testing is still in progress. No new dates yet.

 

  1. Data update

Updates for FOLIO Snapshot

Charlotte is making progress on update of Inventory instance records (in total 36 instance records with Source ID = FOLIO). As of 4/22 we now have 20 records that have been cataloged:

https://folio-quesnelia.dev.folio.org/inventory?filters=staffSuppress.false%2Csource.FOLIO&qindex=instanceAdministrativeNotes&query=better%20sample%20data&sort=title

The 20 catalogued titles have been backed up as json files.

Add 100 MARC records: Autumn has a file ready. Would like to generate order records for these upon import, but needed help determining fund and vendor codes to use.

  • @Autumn Faulkner - Could you please update the status?

  • Instances to have holdings, or holdings/item.

  • Also example on items with No barcode, On order, and regular item barcodes. @Autumn Faulkner - Could you please update the status?

Kristin explained the vendor data.

Charlotte can do this: Ensure that these records has the right mix of misc. types, to cover the basic; incl. bound-with. - We still need 100 MARC records, also do these records have bound-with, we may need 5-10 samples of it.

The Stanford data set includes one boundwith: a474301. Spreadsheet explanaining records provided

Autum: will do an analyzed record.

Discussion Topics:

  • Are circulation rules supposed to be anonymized?

    • In FOLIO Snapshot the circ rules should be minimal

  • FOLIO Snapshot is being build every 24 hours. So the sample data must adhere to that, when adding talking about loan patterns, how do we ensure that open loans adhere to the circ rules for testing things like bills, notices, aging to lost, etc.?

  • @Charlotte Whitt will update Patron notices templates and basic functionality, and update Circ rules accordingly.

    • We can review in our next meeting

    • Are circulation rules supposed to be anonymized?

@Charlotte Whitt @Autumn Faulkner and @Shelley Doljack will have a working session to discuss various records listed below.
Autumn has provided sample data for music records and serials records with multiple holds and multiple items. Holdings statements to be added. - These will be loaded shortly into the env. These records are part of the 100 MARC records.

  • Autumn:

  • Order/Order lines data. Kristin has provided screen shots in Edit mode to Autumn. Single line orders of monographs. Chicago is not doing data import of these records.

  • Autumn will create two job profiles, one for open orders and one for pending status. We can loop in Christie Thomas as needed.

Coming up next:

Agreement Data

  • ERM-SIG to help with providing data, and review the entered data and the data to be added. Kristin to reach out. @Kristin Martin - to update us on Organization, agreements, and license data in the next meeting. Kristin is on the agenda for the 4/30 in the ERM SIG, to talk about this work we are doing in the working group.

    • From Owen: Generally I do testing in Snapshot. Most testing manual creation of data is sufficient, but for cases where I need more extensive data I usually use Postman scripts / collection runner to populate. I have a postman collection that allows me to create any number of randomly named agreements with randomly selected users as internal contacts, optionally with agreement lines, with linked licenses (with or without amendments) .

      That collection can create any number of generic agreements. If I need some specific aspects I can tweak. So things need a little more prep (for example, I don't have full automation for creating eHoldings based agreement lines, but I have a postman request that I can use with a CSV file which contains a list of Agreement and eHoldings IDs to make agreement lines for a given list of eHoldings resources - so when I need to I can create things with a relatively low amount of effort

 

  • Decide to do a wiki page where we document how the FOLIO Snapshot data is built, and other relevant information for test users. @Autumn Faulkner has started a google doc to keep track on the changes we have done.

  • Will talk about setting up a wiki page to document - at next meeting.

Updates on working on updating Stanford's env. to Q release.

 

  1. Stanford is working on sample authority data to be ready soon for the snapshot environment.

  • @Autumn Faulkner - Waiting on MARC records from you, using the DI and make sure it is the right connection (hopefully SRS will do the right thing) with MARC authority.

  • @Shelley Doljack - Already uploaded the authority data in the new snapshot environment.

 

Anonymization of data in Bugfest environments:

Lee Braginsky: Good news on the Data Anonymization front: Stanford Univ is going to field a team of developers for 3 weeks. @Shelley Doljack’s is leading this effort.1st week data analysis was done. There is a spreadsheet with tables to preserve vs. Annoymize.

Focus for 2 weeks will focused on PII, Users, user custom fields, Vendor name, vendor contacts, interface credentials etc.

For anonymizing requirement: -

Shelley has put together a wiki page to gather requirements for anonymization - https://folio-org.atlassian.net/wiki/x/BQA4K .
@Lee Braginsky - to present the open question to this group before Stanford developers can start on this project.

 

Review timeline document

 

Other topics

Any items to discuss?

 Action items (updated 4/22/2025)

@Lee Braginsky will update the track for Scripts to anonymize data set.
@Lee Braginsky will publicize developers' requirements in the #folio-implementers slack channel.
@Yogesh Kumar Create a wiki page where we document how the FOLIO Snapshot data is build, and other relevant information for test users.
@Charlotte Whitt will set up a work meeting with @Autumn Faulkner @Shelley Doljack @Kristin Martin @Charlotte Whitt - to get the Data Import profiles for load of MARC data and write up the two job profiles, one for open orders and one for pending status and get order data added to the Quesnelia environment
@Charlotte Whitt will update Patron notices templates and basic functionality, and update Circ rules accordingly
@Kristin Martin - will reach out to Owen, and ask him to attend an upcoming meeting to present his script for loading of agreement data. Will review the data.
@Charlotte Whitt will update the Circ rules in the Quesnelia environment
@Charlotte Whitt - will look into adding bound-with data to the Quesnelia environment. In the 100 MARC record there is one record which is bound-with. Charlotte will ask Lehigh if we can use 5-10 more sample records from their collection

 Decisions

Lee, Yogesh, Shelley will inform the working group on the talk and progress on developing the anonymization tool. Lee, Yogesh, Shelley, and Noah meets every Monday.

We are meeting every other week - uneven weeks.

 

Related content