[FOLIO-1042] Look at https://airflow.apache.org/ Created: 30/Jan/18 Updated: 18/Jan/19 |
|
| Status: | Open |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P3 |
| Reporter: | Heikki Levanto | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | madrid, sprint31, sprint32, sprint33 | ||
| Remaining Estimate: | 3 hours | ||
| Time Spent: | 4 hours, 30 minutes | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||
| Sprint: | |||||||||
| Development Team: | Core: Platform | ||||||||
| Description |
|
At our workflow discussion in Madrid it was recommended to look at AirBnb's AirFlow, if that could be of use in Folio |
| Comments |
| Comment by Heikki Levanto [ 08/Feb/18 ] |
|
Taking a quick read through the docs, writing down first impressions:
My first impression is not very positive. Airflow is big and complex, requires workflow authors to understand python, and seems mostly to be designed for running repeating scripts, with some dependencies, not controlling manual operations. There are some underlying assumptions, like that all operations must be idempotent (can be rerun in case of problems, so can not insert an item or increment a counter). |
| Comment by shale99 [ 08/Feb/18 ] |
|
this is a python heavy workflow engine - everything is in python - so its a requirement - do we expect users to write their own tasks or are we coming out of the box with a set and adding a task is something or a rarity? - what can be done here is that we can create a template, something like API = expected response , API = expected response , etc... as a pipeline with some notification options (email, etc...) ---- and translate that into python so that jobs can be declarative and then as long is its not too complicated - the python can be abstracted from end users...(just thinking out loud) the tasks are all python code, so we can think of it as whatever the programming language can do you can probably do as well - call apis to update something, fail that api if it was already updated , send an email when something fails. which may require someone , like a librarian, to make a change to something in folio that will correct the status, and then kickoff another pipeline from that point on. the dags (files) are found on disk - usually recommended to store this in an s3 or something... we may be able to get creative and store in the db and move to temp local storage for running, but this is just a guess the ui is sort of tenant aware, you can have a user per tenant and the ui will only display that info - but i am not sure how airflow handles this on the db layer (row level security? just a trigger? i only see a userid column in the chart table....can look deeper into it if needed) - but i dont know if anything else is - i guess if we work with apis only the tenant header will be the separator as for scheduling, it is cron syntax - is it cron though that runs it?, there are some shortcuts as well daily, hourly i am personally undecided - it would be cool to get a bunch of workflows that try to give some sort of an idea of the needed functionality - i think that can help understand if this should be written in house or if we should go with something like airflow |
| Comment by Mike Taylor [ 08/Feb/18 ] |
|
Interesting stuff.
That is a huge issue. When I was thinking in detail about FOLIO workflow a while back, everything came down to the problem of integrating human actions into otherwise automated sequences. A solution that doesn't handle that is no solution at all – we really might just as well invoke curl from cron otherwise.
Inability to do things like adding records is also going to be a deal-breaker. The use of python for writing steps doesn't bother me. It's probably the best programming language for non-programmers to use, and as shale99 implies those steps will in any case mostly come out of the box that we provide. It sounds like if we did use Airflow, we'd probably need to either maintain a custom derivative, or get a bunch of modification accepted back into the mainstream. |
| Comment by shale99 [ 08/Feb/18 ] |
|
just an asterisk to the comments below - i have spent like a 1/2 a day on airflow so i am not an expert and am giving my opinion from what i have learned. |
| Comment by Mike Taylor [ 08/Feb/18 ] |
|
That sounds rather more encouraging than the impression I formed from your and Heikki's earlier comments. |
| Comment by shale99 [ 11/Feb/18 ] |
|
so after a bit more digging - |