2019-01-07 - Data Migration Subgroup Agenda and Notes

Date

Discussion items

TimeItemWhoNotes
5Welcome and updateDaleNeed a note taker.
45Proposal for data loadervarious

Over the break, a committee of the Data Migration subgroup prepared a short document on data loader requirements for submission to the Product Council. The document can be found here. We would like to discuss this document and get opinions and recommendations from the group.

Notes by Ingolf:

Mapping needed: json, MARC, MARX-XML

We have to write tickets for the loader. The epic is UXPROD-850. Patty can make Stories for the command line interface.

Q (Ann-Marie) What will be in Q1 ? - A: In Q1, we will have to just cram it in in json.

Dale: One single developer can put together the requirements for the CLI for the different APIs, the different developer tools.

Wayne: A thin client could be written in Perl or Ruby. Alternatively, a data loader could be realized as an own module with a storage module. The command line interface would then be curl (a linux command, "client for URL").

Wayne/Dale: Handling lists of json data at the module level is a little more elaborate.

Wayne suggest to explicitly separate out MARC conversion; MARC conversion could be a separate thing. We can get a lot without MARC conversion.

Backend modules must support APIs for bulk loading. We need a bulk loader client. But the mapping for users, vendors etc. into json is an exercise for the experts in the institutions. Without MARC conversion one will have very simple requirements. We need to be able to stream the data over existing FOLIO APIs. Bulk loader APIs don't exist, that is work to do with the RAML module builder. Modules would update their dependencies in the RAML module builder (Wayne).

Linkages / referential integrity: A linkage between bibs, holdings and item is necessary - but must the loader be able to do this ? There are different opinions. Initially, the answer is "yes": Maintaining data integrity puts a burden on the institutions. If you have legacy IDs, you need to maintain a mapping between UUIDs and legacy IDs.

Dale: That's an unnecessary complication (for the loader). For the initial loader, referential integrity is not a requirement. We (Chicago) put UUIDs into our legacy system, so they are there as a separate data field to be migrated. We do that for all tables that need to be migrated. I (Ingolf) support the opinion that the initial loaders need not take care of establishing referential integrity, because the institutions are responsible for the data they prepare (for a one-time, initial bulk load).

Patty takes on a leading organizational role in getting the tickets created and prioritized.


Topics for next week
User Stories - Patty - ca. 20 Minutes

Link to Acquisitions Interface Fields
Link to FOLIO Record Data Elements

Action Items

  • Tod Olson Pull list of data elements and descriptions from JSON schema, focus on purchase-order and po_line schemas, send to Beltaine.
  • Tod Olson Contact User Management SIG (Maura Byrne) to find out more about Affiliation data upload plans
  • Dale Arnston will contact Khalilah and Ann-Marie attend an upcoming Data Migration subgroup meeting to discuss questions about the User-Import Endpoint tool and the Data Import tool
  • Tod Olson and Patty Wanninger to draft a Data Loaders Requirements document to for Data Migration group to discuss in 2 weeks (Dec 10)
  • Patty creates a Slack channel for Voyager resource access. She creates #migration-voyager, #migration-aleph, ... Slack channels.