2019-06-03 - Data Migration Subgroup Agenda and Notes

Date

at 11 EST

Link to meeting: https://zoom.us/j/204980147

Discussion items

TimeItemWhoNotes
0New meeting linkDalePlease note. We will be meeting using a new Zoom link going forward: https://zoom.us/j/204980147 The old one expired and is no longer available.
5WelcomeDale

Welcome and request for someone to take notes. Dale will take the notes.

30DiscussionAnn-Marie and Anne Highsmith

Ann-Marie and Anne will review for us the current state of Inventory Data Import, both of the UI upload version, and of the command-line version. Discussion to follow.

Review of the data loader tool, using both the command-line and the UI. Anne Highsmith discussed here experience with the command-line tool. Documentation for the cli is here:


https://github.com/folio-org/mod-source-record-manager


The workflow involves creating a job execution to get an ID. Then post the raw records. The cli works with binary marc and marcxml and marc-in-json. All reserved characters need to be escaped. Escape the control characters as wall as the special characters. The inventory module died trying to load a big file. Jason says the module needs at least 4G of memory, and 2 cores. In Anne’s trial the loader stores the record in SRS, but it doesn’t generate the inventory recorrds.


Ann-Marie says that they are still using the old mapping of marc to inventory. They are working on a new mapping in the MM SIG. This is a fixed mapping. Eventually there will be a configurable mapping utility in the UI. The mod data loader had a rules json that you could submitt with the load. Patty and Chris think we should have something similar for using the cli with the new data loader. Ann-Marie says that Folio would ship with a marc to instance mapping that you could adjust. She says that this should be set up as part of your go-live.


Anne-Marie showed how the data loader works through the UI. In loading data to the hosted environment use: folio-snapshot-load.aws.indexdata.com/data-import, so as not to kill performance on the regular snapshot. Right now the UI only supports binary-marc. To start, go to the data import page, and drag an .mrc file on to the import screen. A File-with carrot appears that allows you to select ‘load bibliographic records’. That job then gets put into the running section of the landing page. The log appears after it has completed. The log entry will be a hot-link to the marc-json that was created. Right now any error terminates the job. The load creates both the SRS and the instance.


You can view the inventory records that are loaded in the usual way, but the button to allow you to see the original marc record is not working. All instance fields that come from marc will be read-only. Ann-Marie showed a slideshow of the data loading process. The UUIDs are assigned, and there is no uniqueness checking. The instance identifier is stored in a 999 field in the SRS marc-in-jsaon, as is the SRS record id. The loader creates the 999. I you are intending to use Marccat, you will need to wait for it to be available before you do your actual data load because the load creates a copy for Marccat as well.


The data loader is not intended to do data migrations. There is also a much more complex loader that is coming that will allow libraries to import actual data. This is a POC, essentially.


Patty asked, what is the relation of this loader to a data migration loader? Here's the UI for adjusting the standard MARC-Instance mapping: https://folio-org.atlassian.net/browse/UXPROD-1479. Eventually there will also be an event manager that will control how the data loader is, for example, integrated with the creation of orders. Here is the link to the event-driven Jira: https://folio-org.atlassian.net/browse/MODDATAIMP-104 .


Eventually, the mapping of marc to other modules will be handled by sending a rules-json record and a data-json record to the relevant module to do the necessary processing. Marc holdings records will be handled separately from marc records in the creation of instance and holdings records in inventory.


Ann-Marie says that she could volunteer the time of the Folejet developers to discuss the work that they have done on the apis, and maybe on what is needed to support data migration, but they don’t have the time to modify the apis to provide what is needed for data migration.

Link to Acquisitions Interface Fields
Link to  FOLIO Record Data Elements  (contains links to specific spreadsheets, but most of them are not up to date.)

Action Items