2019-06-18 f2f data migration subgroup meeting notes
Date
Attendees
- Wayne Schneider
- Jenn Colt
- jroot
- Theodor Tolstoy (One-Group.se)
- Anne L. Highsmith
- Kevin Walker
- Aaron Trehub
- Christie Thomas
- Ian Walls
- Jeremy Huff
- Charlotte Whitt
Goals
Discussion items
Time | Item | Who | Notes |
---|---|---|---|
Definitions | Data migration tooling is about getting data into a new FOLIO system (not ongoing record import maintenance, or ongoing patron loads). | ||
General Approach | Outlined at Wolfcon: https://docs.google.com/document/d/1jo2UMDtOKSjBxkXiG2KlqSK097j5U23HIPQSEpAX6fg Data migration is an ETL process, extract and transform are mostly in the hands of libraries doing the implementation. The one exception is MARC records (mapping to instance). The approach is to provide or share tooling for the load step of the ETL process where it is generalize-able. | ||
Requirements | There are requirements articluated in a few places already: https://folio-org.atlassian.net/browse/UXPROD-850 https://docs.google.com/document/d/1cMmKSJ2L8OqSSeMZ0aIxmuQn4N-q76oFax6S0Q2xp9k https://docs.google.com/document/d/1oXbEE48zd889lGD87dP7cF3GfuuKwllI_MDp2zGwTRg Work has not started on many of the issues under UXPROD-850. Group generated a list of things to ask for:
Comment on bulk apis: would still be useful. You still need to touch every record, using bulk APIs makes you touch them before you load into FOLIO. | ||
Existing Tools |
Discussion of direction for data migration tooling in next few quarters: TAMU is working on using mod-camunda/workflows to do data import. Things that are generalization would be good candidates for tooling. For example everyone will need a way to persist things like open loan IDs during migration. RAML module builder has support for bulk import api endpoints but few if any modules are implementing this feature so far. Would like modules to support bulk data loading capabilities. Error reporting is another important component--some way to manage incomplete loads. A few possible paths forward:
Bywater’s toolkit is designed to get data ready to go into Koha. Using this would require development to do 1.) adapt to FOLIO data structures 2.) do the actual data loading. Would it be good to have a module for data loading, or would it be better to have a set of command line tools? One benefit to having a module is that its very visible, and could possibly help with adoption if people could see that there was a tool available. Many sysadmins are already comfortable with CLI tools. If the command line solution is easier and faster to implement, then it could be the way to go for the early implementers. A module could be developed later (perhaps based on some of the same logic). There is also a desire for an API endpoint where one can post MARC and get back instance records. Workflow (mod-camunda) could also be used to develop import workflows & set up complex interactions, would also like to have bulk update capabilities for this. The idea of using workflows is fairly new and not documented yet. This work is happening at TAMU. Import concerns other than bulk support: de-duplication making sure data types are properly linked, how to mint UUIDs and make sure they line up (some of this overlaps with the transform step of the etl pipeline). Error reporting |
Action items
- patty.wanninger invite TAMU to show work with workflows at a future meeting