2019-07-22 - Data Migration Subgroup Agenda and Notes

Date

at 11 EST

Link to meeting: https://zoom.us/j/204980147

Discussion items

TimeItemWhoNotes
5Welcome and assign note taker.Dale

Welcome and request for someone to take notes, preferably, someone who hasn't done it in awhile

40Using work flows in data migrationJeremy Huff and others

More discussion of Jeremy's POC on using work flows for data migration.

Here are the documents from last week: PowerPoint and POC

Discussion:

  • Updates from Jeremy:
    • Slow down has been diagnosed and overcome, had been caused by some utility functions. They were refactored and got through the whole vendors migration in 6.5 minutes.
    • More optimizations to explore. Regularly blowing up Okapi!
    • Scaling up beyond vendors still needs work. Would like to run Okapi in cluster mode with three Okapis in round robin, think this would speed it all up but having trouble getting it to run in the clustered way. (Right now still all without batch requests) Just getting some batch requests going will help a lot if we can get it. That plus clustering should get some good gains.
    • Dealing with numbers and times that hold a lot more promise now.
    • Jason- Vertex issues with Okapi clustering. Spin up in cluster mode and after a while Vertex blocks threads, even not doing large load. Thinks its a bug in there because have run clustered up until recently without problems.
    • Reviewed code with Ian as PO. Working on creating backlog for this that maps out future development. Getting code to shareable place for other use cases.
    • TAMU sprint cycle at the end of this week to explore user data. Looking at nightly job to bring users into ILS, currently it is a cron task. More like an integration task, one many will probably need. Todd: would like to do deltas on this kind of load, for safety.
    • Dale- how does supplementary code that is not SQL become part of the process? A: coded the expecting to have different data sources and flavors of SQL. Implemented an extraction interface that can go against different sources, like CSV, LDAP, etc. Currently to build the json object the SQL statement is actually creating strings, kind of bespoke encoding of the data. Produce one json object but never go to processing phase in the extractor, just format how data comes back in the SQL statement. So processor scripts in mod-Camunda have to be built to deal with that JSON. As far as merging data sources, will explore this week because they have second vendor data source to add in. Imagining handling it on the workflow side rather than extractor. Do multiple extract calls on the workflow side. Want extractor responsibilities to be small, push other responsibilities to workflow.
    • Hope to have set of extractors and robust workflows to work together in the end
  • Ian: Having a CSV method would be a good thing to have after SQL. SQL is most of what they see followed by XML and JSON. Ability to see into the mapping methods easily would be helpful.
  • Jeremy: with multiple sources envisioning extracting, setting aside, repeatedly then assembling.
  • Right now the bottleneck is not the processing phase. Spit stuff out much faster than Okapi can handle right now, so focus on efficiency isn't really on the extract/transform part. This would have to be a lot slower to be the bottleneck instead of Okapi.
  • Writing directly to backend as test, went around okapi to the storage modules, have not written directly to database yet. Didn't get it running just yet. Eager to see how much faster it goes if you don't go through Okapi. Don't like that one implication is going directly to storage module means missing out on business logic and pushing work back out to workflow. For vendors doesn't matter so much but for users would be nice to use the users business logic because so much happens when you create a user. If we can get something fast enough without obviating Okapi that would be preferable
  • Will batch apis have business logic? Other than SRS importer? Still talking about batch APIs so now's the time to talk about it.
  • FOLIO-2050
  • Decisions might be made about where to optimize. Adding business logic modules so they can be part of batch APIs could be a lot of work without enough pay off? Don't know of plans to implement business logic right now. But don't want to use workflow to close doors. Will see what can happen. But if it is wanted should write issues.
  • Since not doing batch right now, no reason not to point at business logic modules, keeps workflows simpler
  • If you go through BL do they force their own UUIDs? Right now they are letting system generate UUID for their vendors, but in the future will want to provide. Some biz modules let you provide IDs, some don't?
  • Jason is going to write up Okapi issues, still trying narrow down the clustering issue to write up better details. Just hit it on Friday. Will at config files with others.  2.3.0 is the first time having this problem. Would be good to help core team get a test set up for this condition.

Other topics

Link to Acquisitions Interface Fields
Link to  FOLIO Record Data Elements  (contains links to specific spreadsheets, but most of them are not up to date.)

Action Items