2019-07-15 - Data Migration Subgroup Agenda and Notes

Date

15 Jul 2019 at 11 EST

Attendees

Link to meeting: https://zoom.us/j/204980147

Discussion items

Time	Item	Who	Notes
0	New meeting link	Dale	Please note. We will be meeting using a new Zoom link going forward: https://zoom.us/j/204980147 The old one expired and is no longer available.
5	Welcome and assign note taker.	Dale	Welcome and request for someone to take notes, preferably, someone who hasn't done it in awhile
5	Product Owner change	Dale	First, some news to share. Patty W. will be leaving her role as product owner forData Migration, and becoming product owner of the Users SIG. Thank you for all of the work you put into that position so cheerfully, Patty! Ian Walls will be taking over as product owner from Patty going forward. Welcome Ian!
40	Using work flows in data migration	Jeremy Huff	Jeremy will review some of the work he has been doing in data migration using work flows at TAMU. Discussion to follow. PowerPoint Jeremy reviewed Document Exploring process of using a workflow engine for the migration process (Voyager to FOLIO) Proof of Concept using "Organizations" Architecture mod-data-extractor - persist a custom query, starting point in the process - execute query and stream response of query mod-workflow - provides an abstraction for the workflow engine. can persist a workflow, dynamically generates xml needed to execute(?) mod-camunda - service wrapper around Camunda workflow engine. Event listening to trigger workflows registered with it mod-organization-storage - destination for all migrated data for this POC Workflow tasks (e.g. ExtractorTask, ProcessorTask, CreateForEachTask, AccumulatorTask) Support for multiple scripting languages. They used JS. Used APIs Results slightly disappointing created 7,294 organizations, 1,577 contacts...this took 45 minutes. At 3-4,000 it would slow and stop....would stop for 35 minutes, resume and finish...maybe a database indexing issue? They will be investigating If indexing is the problem...a batch loading process may help They may investigate the way the DB is setup via RAML module builder. Maybe turn off indexing during the load? The process was run on a dev machine (40GB RAM) pointing to FOLIO/Cloud Getting this running in the same ecosystem will be a next step Conclusions Has potential, capable of orchestrating complex migration.... Figure out slowdown Move dynamic workflow creation from mod-workflow into mod-camunda Explore batch create endpoints ...see further details in the powerpoint & document Chicago has experience investigating indexing issues. May be able to help. Q: What determines which scripting languages are supported? → JVM support of various scripting languages. Several enabled at this time – JS, Ruby, Perl, Python...5-6....more could be added to the that. Javax script package. Future plan – UI interface to pick scripting language. Q: How do you handle reference data? Lookup Tables? → For this POCs...they statically expressed them in the script. Created manually ahead of time. Feasible for categories - small set a data. Future – some sort of delegate that does lookup. Chicago generates the IDs ahead of time in their migration process. Q: Advantages over a scripting language? Using the workflow doesn't offer as much as it seems to...scripts can solve the same problem. (e.g. they had a set of Perl scripts that accomplished some data migration). However... Workflow - gave them a chance to explore using workflows against FOLIO API They've created a reusable framework for interacting with FOLIO...supporting different processes that will vary from institution to institution Extracting the data Processing of the data This could also be used to migrate data from FOLIO to a new version of FOLIO Link to public repo. in PowerPoint. Possible agenda item for next week - more questions about this POC.
10	Discussion Tabled for lack of time	various	If any time is left, we will continue to discuss bulk API requirements for migration. Some jiras of relevance: FOLIO-1932 - Getting issue details... STATUS ., FOLIO-2050 - Getting issue details... STATUS , UXPROD-1826 - Getting issue details... STATUS , MODINVSTOR-295 - Getting issue details... STATUS , MODINVSTOR-296 - Getting issue details... STATUS And for some background and discussion from other sites (cited by Anatolii Starkov): https://evertpot.com/http/207-multi-status https://apihandyman.io/api-design-tips-and-tricks-getting-creating-updating-or-deleting-multiple-resources-in-one-api-call/ https://medium.com/paypal-engineering/batch-an-api-to-bundle-multiple-paypal-rest-operations-6af6006e002 https://developers.google.com/gmail/api/guides/batch This is a complex topic, and we will do well to get oriented in this session, and discuss what is possible. We can continue with requirements and stories future sessions.
	Future agenda items		patty.wanningeralso emphasized the importance of documenting requirements for bulk data APIs – so please update the issues referenced above. more questions about this POC Are there things we want to do to support this work (Jeremy's work) getting pulled into main stream of FOLIO.

Link to Acquisitions Interface Fields
Link to FOLIO Record Data Elements (contains links to specific spreadsheets, but most of them are not up to date.)

2019-07-15 - Data Migration Subgroup Agenda and Notes

Date

Attendees

Discussion items

Action Items