2018-05-21 - Data Migration Subgroup Agenda and Notes

Date

21 May 2018

Zoom:

Attendees

Goals

Discussion items

Item	Who	Notes
Welcome		Who will take notes today? Wayne & Anne created a mailing list for this sub-group. Everyone should have received an email. If not, go to this url and request to be added to the data-migration group. Dale can do the convening for the next meeting.
Is there an update on getting a product owner (for sys-ops)?	Tod	Proposal sent to Sharon Wiles-Young (PC Chair), circulated to PC, and on the agenda for the next meeting.
Continued from last week. Does this conversation need to be continued? Wayne provided an update on Friday.		We ended last week during a discussion about: demo site test data production type data edge case data performance testing against real world data test/dev/reporting/training server environements Wayne won't be able to attend the meeting today but he provided this update: "The core DevOps team, in collaboration with EBSCO, is currently looking at creating a reference production level FOLIO install, with a reference set of test data, to be shared. In addition, we will create a set of tests to benchmark performance using the reference installation and data. The idea is to have something reproducible, that folks could clone and validate on their own infrastructure. It would be great if this group could help us understand the requirements of what constitutes "real world data", and help us generate them."
		Discussion notes, as recorded by Chris Manly: Mailing list - Subscribe if you haven't SysOps PO update: proposal went to Sharon, it's on the agenda for this week's PC meeting Updates from Wayne (in absentia): Looking for examples for what the real-world data might look like. Patty: EBSCO asked for exemplar records from Chicago, Cornell, TAMU that would be "nasty" (i.e. very large serials holdings). Got 64 records from Cornell. Have a script that uses Khatmandu (loader) that forces MARC data into FOLIO json. Just trying to get the 64 big records to load successfully. Going to try to get it done in the sprint over the next couple of weeks. Have other folks attempted to load data? EBSCO is just trying to get the baseline done. Chicago: we've only loaded users into the EBSCO-hosted deployment. Currently gone down the rabbit hole of getting the local deployment working, now that is up they will shift back to working on loading data. Can we put the collective set of difficult records from various institutions into a single place that everyone can use? Anne suggests they just go into the data migration folder in Google Drive. The records need to be anonymized and/or have price data removed from 9XX, but other than that they can be shared. Do we need to worry about locations? No, we don't have real circ rules/locations yet. Are folks from this SIG attending the UI review for Inventory as part of the MM-SIG this week? No, but our Metadata folks likely are. Patty will try to attend and listen in. That UI review is happening in parallel to the gap analysis, which has impact on both the UI and data loading. The difficult records (volumnious holdings) have impact on the UI -- how does the UI present a record with exceedingly large holding? Patty will give the exemplar records to Wayne. Dale: what about gap analysis from our perspective? Inventory records are being handled by Chrisi in RM/MM? Chris will take users to UM-SIG this week. We haven't done other data types yet. What about acq modules? Patty will get the json schemas for vendor records so we can map vendor records. We will aim to look at those schemas the next time we meet (which will be the first Monday in June). MM-SIG is working on a data dictionary, which explains what various fields mean to clear up confusion about terminology. As a practical matter, there is already a lot of information in the json schema in the RAML modules. If we can put the additional data in the module description a data dictionary could be auto-generated from that. It keeps it closer to the developers and aligns with the existing development process. Could someone do training on how to understand the RAML descriptions so that someone who is not a developer could contribute to the data dictionary? We need to bring the devs and the SMEs closer together on this. Perhaps a PC or TC issue? OLE partners are having a separate meeting of implementers? Is that the best use of time. Discussion of the line between OLE and EBSCO implementations, and is there one? Questions about a data migration test from Anne: Dale did a user load that took 16 hours for 90K users (using the one-at-a-time API). Anne has pulled together some user records: test load of 5K users in 15 minutes using the standard /users endpoint. Are there differences between /users and /users-bl, is there performance difference between EBSCO-hosted instance and local? Do we need to have a log of loads and track performance? Each site should do their own and we can bring them together as we compare Discussion of inventory storage: bib data will be stored as MARC in addition to the inventory json. The plan for MARC holdings is unclear, and institutions that have a lot of MARC-based holdings would like to see it maintained that way. Is storing holdings in MARC a good idea? Call number is an example of a place where MARC is more capable than the current inventory definition. Concern is that MARC holdings will be stored but not updateable via the MARC cataloging interface. For some institutions, MARC is insufficient for holdings. For some, FOLIO's inventory schema is insufficient whereas MARC holdings does meet the need. Plan for next time: Loans, requests, RM data (if we can get the schemas)

2018-05-21 - Data Migration Subgroup Agenda and Notes

[data-colorid=hhcsbjpj0g]{color:#222222} html[data-color-mode=dark] [data-colorid=hhcsbjpj0g]{color:#dddddd}[data-colorid=h6iesx3gi0]{color:#222222} html[data-color-mode=dark] [data-colorid=h6iesx3gi0]{color:#dddddd}Date

Zoom:

Attendees

Goals

Discussion items

Action items

Date