2019-07-15 - Data Migration Subgroup Agenda and Notes

Date

at 11 EST

Link to meeting: https://zoom.us/j/204980147

Discussion items

TimeItemWhoNotes
0New meeting linkDalePlease note. We will be meeting using a new Zoom link going forward: https://zoom.us/j/204980147 The old one expired and is no longer available.
5Welcome and assign note taker.Dale

Welcome and request for someone to take notes, preferably, someone who hasn't done it in awhile

5Product Owner changeDaleFirst, some news to share. Patty W. will be leaving her role as product owner forData Migration, and becoming product owner of the Users SIG. Thank you for all of the work you put into that position so cheerfully, Patty! Ian Walls will be taking over as product owner from Patty going forward. Welcome Ian!
40Using work flows in data migrationJeremy Huff

Jeremy will review some of the work he has been doing in data migration using work flows at TAMU. Discussion to follow.

PowerPoint Jeremy reviewed

Document

Exploring process of using a workflow engine for the migration process (Voyager to FOLIO)

  • Proof of Concept using "Organizations"

  • Architecture
    • mod-data-extractor - persist a custom query, starting point in the process - execute query and stream response of query
    • mod-workflow - provides an abstraction for the workflow engine.  can persist a workflow, dynamically generates xml needed to execute(?)
    • mod-camunda - service wrapper around Camunda workflow engine.  Event listening to trigger workflows registered with it
    • mod-organization-storage - destination for all migrated data for this POC

  • Workflow tasks (e.g. ExtractorTask, ProcessorTask, CreateForEachTask, AccumulatorTask)
    • Support for multiple scripting languages.  They used JS.
    • Used APIs

  • Results
    • slightly disappointing
    • created 7,294 organizations, 1,577 contacts...this took 45 minutes.  At 3-4,000 it would slow and stop....would stop for 35 minutes, resume and finish...maybe a database indexing issue?
    • They will be investigating
    • If indexing is the problem...a batch loading process may help
    • They may investigate the way the DB is setup via RAML module builder.  Maybe turn off indexing during the load?
    • The process was run on a dev machine (40GB RAM) pointing to FOLIO/Cloud
    • Getting this running in the same ecosystem will be a next step

  • Conclusions
    • Has potential, capable of orchestrating complex migration....
    • Figure out slowdown
    • Move dynamic workflow creation from mod-workflow into mod-camunda
    • Explore batch create endpoints
    • ...see further details in the powerpoint & document

Chicago has experience investigating indexing issues.  May be able to help.

Q: What determines which scripting languages are supported?  → JVM support of various scripting languages.  Several enabled at this time – JS, Ruby, Perl, Python...5-6....more could be added to the that.  Javax script package.  Future plan – UI interface to pick scripting language.

Q: How do you handle reference data?  Lookup Tables?  → For this POCs...they statically expressed them in the script.  Created manually ahead of time.  Feasible for categories - small set a data.  Future – some sort of delegate that does lookup.  Chicago generates the IDs ahead of time in their migration process.

Q: Advantages over a scripting language?  Using the workflow doesn't offer as much as it seems to...scripts can solve the same problem.  (e.g. they had a set of Perl scripts that accomplished some data migration).  However...

  • Workflow - gave them a chance to explore using workflows against FOLIO API
  • They've created a reusable framework for interacting with FOLIO...supporting different processes that will vary from institution to institution
    1. Extracting the data 
    2. Processing of the data
  • This could also be used to migrate data from FOLIO to a new version of FOLIO

Link to public repo. in PowerPoint.  

Possible agenda item for next week - more questions about this POC.


10

Discussion

Tabled for lack of time

various

If any time is left, we will continue to discuss bulk API requirements for migration.

Some jiras of relevance: FOLIO-1932 - Getting issue details... STATUS ., FOLIO-2050 - Getting issue details... STATUS , UXPROD-1826 - Getting issue details... STATUS , MODINVSTOR-295 - Getting issue details... STATUS , MODINVSTOR-296 - Getting issue details... STATUS

And for some background and discussion from other sites (cited by Anatolii Starkov):

https://evertpot.com/http/207-multi-status
https://apihandyman.io/api-design-tips-and-tricks-getting-creating-updating-or-deleting-multiple-resources-in-one-api-call/
https://medium.com/paypal-engineering/batch-an-api-to-bundle-multiple-paypal-rest-operations-6af6006e002
https://developers.google.com/gmail/api/guides/batch

This is a complex topic, and we will do well to get oriented  in this session, and discuss what is possible. We can continue with requirements and stories future sessions.


Future agenda items

 patty.wanningeralso emphasized the importance of documenting requirements for bulk data APIs – so please update the issues referenced above.

  • more questions about this POC
  • Are there things we want to do to support this work (Jeremy's work) getting pulled into main stream of FOLIO.

Link to Acquisitions Interface Fields
Link to  FOLIO Record Data Elements  (contains links to specific spreadsheets, but most of them are not up to date.)

Action Items