2024-11-19 Better Sample Data Meeting notes
Date
Nov 12, 2024
Participants
@Yogesh Kumar (regrets), @Lee Braginsky, @Charlotte Whitt , @Kristin Martin (regrets) @Autumn Faulkner, @Alissa Hafele @Tod Olson
Goals
Follow up on the status of discussion topics and task
Discussion topics
Time | Item | Notes |
---|---|---|
|
Follow up on activities since last meeting:
| Finalize the letter for the MSU dean. @Autumn Faulkner- We need to finalize the Google Doc with the revised version of the letter to be brought to the Is the letter finalized?
Identified MARC formatted holding records; we still need an instance to go with it. Alissa is working with a colleague at Stanford, and will work on getting the records exported. @Alissa Hafele also contributed with Authority Controlled data. See Slack conversation. Added to the Google drive. Any update?
a. Golden Copy / Yogesh Kumar Charlotte was asked by Mike Gorrell about what number of instances was intended for the Bugfest Environment / golden copy. Mike Gorrell 5:08 PM Thanks. Do you know how big the Bugfest dataset will be in terms of # of Bib records?
Golden copy has ~8-9 Million instances. For performance testing the PTF testing they have a larger environment with 20 mio instances. The environment was created by the Kitfox team and updated by QA. Yogesh' QA team removed a number of duplicates in the Ramsons environment. b. Snapshot environment. Update from Charlotte Whitt: Still waiting for the environment - monthly build environment: https://folio-quesnelia.dev.folio.org/ to be built. Update: Charlotte is still trying to get this to move. Status is the work is still Open (To do). I’m really sorry that we are stuck. Any update? The Jira ticket for creating https://folio-quesnelia.dev.folio.org/ is FOLIO-4071. The FOLIO DevOps group is just getting reorganized, and the expectation is they will get to do this work by mid-November. Status: Autumn has provided sample data for music records and serials records with multiple holds and multiple items. Holdings statements to be added. These records are loaded to this groups shared drive. Write up Data Import Job profiles - Kristin Martin? KEM: once reference environment is available, I will work on DI profiles. Data Import Job Profile which can import ~ 100 bibs in MARC 21 and create instance, holdings, item (corresponding to the locations we have set up). MM-SIG eyes on the 100 bibs. That these records has the right mix of misc. types, to cover the basic; incl. bound-with. Stanford has MFHD-formed holdings data in the current export workflow. Alissa Hafele mentioned that Stanford could probably provide these data. Maybe ~25-50 examples—In progress. Any update? Then later we can move on to Order/Order lines data. We will ask the Acquisition SIG (Kristin Martin will take the lead on this), the ERM-SIG to review the entered data - waiting for initial env. and data to be added. c. Anonymization script Update: from @Lee Braginsky Presented this for the Community Council. This resulted in an engaged discussion. Lee have gotten positive response from a student from University of Colorado, who would be willing to work on this. The need is 2 developers from the community with Java/SQL and the knowledge of FOLIO Schematool to anonymize the data for 2 sprints. Alternative do Python. Tod suggests publicizing developers' requirements in the #folio-implementers slack channel. When Stanford is ready to deliver data, Yogesh and Lee can set up a meeting. Alissa will check with her colleagues. @Lee Braginsky Can we make the anonymized POC slide deck available in the WG folders? Is the POC code in Github? Yogesh to follow up. https://folio-org.atlassian.net/wiki/spaces/DQA/pages/409108481/Data+Anonymization+Project
|
|
|
Nothing new to report |
|
| ?
|