2023-02-08 Meeting notes: Airflow for bibliographic workflows
Date
Feb 8, 2023
Housekeeping
Convener and notes: @Martina Schildt
Next meeting: Feb 13, 2023
Discussion items
Airflow for "bibliographic workflows" | @jpnelson at Stanford | until 12:30 PM ET
es
Airflow for "bibliographic workflow"
open source project
widely used - often in big data or machine learning
used airflow for other workflows before adopting for FOLIO
for FOLIO used it for data migration - mainly to support bibliographic workflow
about 10 million MARC records from previous ILS
what does workflow mean in the context: airflow construct a workflow as discrete tasks
consists of DAGs | DAG: Directed Acyclic Graph (one direction = non-looping code) which complete a single specific task
tasks are grouped into task groups
instances are posted before holdings and then items
when finished → running a separate process → check whether all records are went into FOLIO
Owen in chat: So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
Jeremy: kind of, a DAG is a separate workflow, different DAGs are connected for migration
Charlotte in chat: Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
follows - min. ~22
Owen in chat: Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
Jeremy: yes
Charlotte in chat: Do you have a link to the GitHub documentation?
Martina S: what can we do with the gained knowledge?
Owen: some similarity between some of the tools
describe different scenarios and why the tools work seems important
e.g. why is a dataflow tool like airflow useful for data loading
what are the differences and what are the similarities
building structured data flows vs. using an external tool to integrate with FOLIO (request demo)
Maura: different tools for different kinds of processes
presentations and knowledges give a lot to take back to institutions
Owen: can we make the solutions more adoptable to the community?
do we need to work on making integration easier
do we need SIGs for the different tools or a workflow SIG?
Maura: would be good for more libraries to get the benefit of the presentations
maybe have a Slack channel wehre people can share tools and experiences
we should avoid re-inveting the wheel again and again
Martina S. should we ask Duke to present on workflow management with Trello
yes, or other institutions using Trello
or MS Teams
to improve the overall experience I would ask you to answer the linked survey, that will only take a few minutes
the survey will be open until Feb 22nd
Chat
18:13:00 Von Owen Stephens an Alle:
So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
18:18:50 Von Charlotte Whitt an Alle:
Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
18:19:40 Von Owen Stephens an Alle:
+1 to that question Charlotte
18:23:01 Von Owen Stephens an Alle:
Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
18:25:26 Von Owen Stephens an Alle:
Thanks Jeremy
18:25:40 Von Maura Byrne an Alle:
This is great.
18:25:53 Von Charlotte Whitt an Alle:
It looks really amazing
18:26:48 Von Charlotte Whitt an Alle:
Do you have a link to the GitHub documentation?
18:27:55 Von Jeremy Nelson an Alle:
https://github.com/sul-dlss/libsys-airflow
18:28:06 Von Charlotte Whitt an Alle:
Thanks a lot
18:28:46 Von Owen Stephens an Alle:
It seems not completely dissimilar to the demonstration from last time
18:29:14 Von Owen Stephens an Alle:
I’m just trying to remember what that was called :O
18:29:54 Von Owen Stephens an Alle:
Thanks Martina - yes it seems similar to Prefect
18:30:31 Von Kristin Martin an Alle:
https://folio-org.atlassian.net/wiki/display/REL/Team+vs+module+responsibility+matrix
18:45:10 Von Owen Stephens an Alle:
100% on avoiding re-inventing the wheel
18:47:28 Von Kristin Martin an Alle:
Or MS Teams
18:48:05 Von Owen Stephens an Alle:
The “learning apis” slack channel has questions that come up in similar way, but I like the idea of an “Automation approaches” channel
18:48:18 Von Maura Byrne an Alle:
+1
18:48:52 Von Owen Stephens an Alle:
In fact some of the questions that come up in Learning APIs are absolutely automation questions
18:49:28 Von Owen Stephens an Alle:
The discussion happening in there today is an example
Transcript
Future topics
Topic proposal by @Owen Stephens for October:
Use of shortcut keys and macros for more effective cross-app working - it also be good to have UX and Stripes/dev knowledge for this discussion I think. I know @Laura (she/they) uses macros so might have insights into the potential for cross-app working
Potential for external 'workflow' solutions for cross-app interactions
I think 'workflow' is a dangerous term here - in this context it's more about automation than user workflows, although I think there is overlap
I was particularly struck by the solution in production at TAMU (Jeremy Huff and Sebastian Hammer presented, the recording is at https://prod-zoom-recordings-openlibraryfoundation-org.s3.amazonaws.com/50dc6c87-3912-43fa-8287-56ec73b12bbb%2Fshared_screen_with_speaker_view%28CC%29.mp4 starting at 3 hrs, 14 min) - I think getting someone from TAMU to talk about how this is used would be v interesting
There was also a presentation on the use of a tool called Airflow at Stanford for "bibliographic workflow" but I've not watched that yet so not 100% sure if it is completely applicable - I think the core use case there was systems migration but it may go beyond that
Jenn Colt on using Prefect
does not need to be workflow across apps
UX/UI and implementers topics
should be Wednesdays
Comprehensive look at where data is copied and stored as opposed to live data | how it is represented
Date filters and how they work in different apps
Attendees
Present | Name | Home Organization |
|---|---|---|
Brooks Travis | EBSCO | |
x | Charlotte Whitt | Index Data |
Dennis Bridges | EBSCO | |
x | Dung-Lan Chen | Skidmore College |
Erin Nettifee | Duke | |
Gill Osguthorpe | UX/UI Designer - K-Int | |
Heather McMillan Thoele | TAMU | |
Ian Ibbotson | Developer Lead - K-Int | |
Jag Goraya | K-Int | |
x | Jana Freytag | VZG, Göttingen |
Jenn Colt | Cornell | |
Khalilah Gambrell | EBSCO | |
Kimberly Pamplin | TAMU | |
Kirstin Kemner-Heek | VZG, Göttingen | |
Kristin Martin | Chicago | |
Laura Daniels | Cornell | |
Lloyd Chittenden | Marmot Library Network | |
Marc Johnson | K-Int | |
x | Martina Schildt | VZG, Göttingen |
x | Martina Tumulla | hbz, Cologne |
x | Maura Byrne | Chicago |
Mike Gorrell | Index Data | |
x | Owen Stephens | Product Owner - Owen Stephens Consulting |
Patty Wanninger | Product owner Users app | |
Rachel A Sneed | TAMU | |
Sara Colglazier | Five Colleges / Mount Holyoke College Library | |
x | Susanne Schuster | BSZ Konstanz |
John Coburn | EBSCO | |
Zak Burke | EBSCO | |
Daniel Huang | Lehigh | |
x | Maccabee Levine | Lehigh |
Robert Scheier | Holy Cross | |
x | Jeremy Nelson | Stanford |
x | Ingolf Kuss | hbz |