2024-10-01 Better Sample Data Meeting notes
Date
Oct 1, 2024
Participants
@Yogesh Kumar (sends regrets), @Lee Braginsky (sends regrets), @Charlotte Whitt , @Kristin Martin @Autumn Faulkner
Goals
Follow up on the status on discussion topics and task
Discussion topics
Time | Item | Notes |
---|---|---|
|
| Continue talk/input on @Autumn Faulkner's letter to Risk Office at Michigan State University Libraries.
Charlotte will check with Kirstin Kemner-Heek if the OLF has a letterhead we can use. Otherwise we can use any of the PC member institutions letterhead, and write on behalf of the Product Council.
Michigan State University Libraries will aim for providing data from Inventory and anonymized data from other domains. Autumn Faulkner asks if Yogesh and Lee can provide more details about the scrambling tools. Autumn will get a revised version of the letter.
Follow up on activities since last meeting: WOLFcon 2024: Yogesh and Charlotte have talked with Stanford University, who are interested in providing data to the Golden copy, and also join our group. Plan - who will take on what task: a. Golden Copy / Yogesh Kumar Environment creation is in progress, likely to complete this week (9/20) The QA team will fix data according to the test data feedback wiki. Link to wiki - https://folio-org.atlassian.net/wiki/spaces/DQA/pages/203685917 Has gotten some feed back from POs, but not enough. Jira tickets has been written up for KitFox team - Sprint 199. Jira ticket: BF-752 (Epic). Will then be what we will use as copy for Bugfest Ramsons. Expect to have this done by end of September.
b. Snapshot environment / Charlotte Whitt Charlotte has reached out to ID Devops re. adding data to monthly build environment: https://folio-quesnelia.dev.folio.org/ (the link is not up and running yet) Index Data’s DEVOPS are working on this as we speak: FOLIO-4071 Create Quesnelia reference environment
Charlotte to fix the 36 instances with Instance source = FOLIO - and will do this as soon as the environment is ready. Recruit members from MM-SIG - extra eyes, review the records updated in FOLIO Instance = FOLIO. Autumn has a music cataloger back ground. Will find records in OCLC. Will aim for next week. These records are loaded to this groups shared drive. Also include Serial records, with multiple holdings each with multiple items. Add holdings statements. Write up Data Import Job profiles Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up). MM-SIG eyes on the 100 bibs. That these records has the right mix of misc. types, to cover the basic; incl. bound-with. https://folio-org.atlassian.net/wiki/spaces/DQA/pages/203685917/Testing+Data+Feedback Will ask if MM-SIG member has experience with MFHD. Stanford has this. Charlotte will check in with Darsi and Alexis. Load MARC Authority data - this task we will come back to after WOLFcon. Then later we can move on to Order/Order lines data. We will ask the Acquisition SIG (Kritin Martin will take the lead on this), the ERM-SIG to review the entered data.
c. Update: from @Lee Braginsky We have successfully completed the POC on dataset anonymization. I can present results at the next meeting. Next steps - TBD Status on writing robust data anonymization scripts / Lee Braginsky. Did get a university to provide data, which will be anonymized/scrambled
|
|
|
We will split it up the the three tracks as listed above. Yogesh will update the Golden copy (Bugfest) section. Charlotte will update the FOLIO Snapshot section Lee will update the time line doc for his work on anonymization scripts. |
|
|