2023-02-08 Meeting notes: Airflow for bibliographic workflows
Date
Housekeeping
- Convener and notes: Martina Schildt
- Next meeting:
Discussion items
- Airflow for "bibliographic workflows" | jpnelson at Stanford | until 12:30 PM ET
- AI SIG survey
es
Airflow for "bibliographic workflow"
- open source project
- widely used - often in big data or machine learning
- used airflow for other workflows before adopting for FOLIO
- for FOLIO used it for data migration - mainly to support bibliographic workflow
- about 10 million MARC records from previous ILS
- what does workflow mean in the context: airflow construct a workflow as discrete tasks
- consists of DAGs | DAG: Directed Acyclic Graph (one direction = non-looping code) which complete a single specific task
- tasks are grouped into task groups
- instances are posted before holdings and then items
- when finished → running a separate process → check whether all records are went into FOLIO
- Owen in chat: So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
- Jeremy: kind of, a DAG is a separate workflow, different DAGs are connected for migration
- Charlotte in chat: Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
- follows - min. ~22
- Owen in chat: Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
- Jeremy: yes
- Charlotte in chat: Do you have a link to the GitHub documentation?
- Martina S: what can we do with the gained knowledge?
- Owen: some similarity between some of the tools
- describe different scenarios and why the tools work seems important
- e.g. why is a dataflow tool like airflow useful for data loading
- what are the differences and what are the similarities
- building structured data flows vs. using an external tool to integrate with FOLIO (request demo)
- Maura: different tools for different kinds of processes
- presentations and knowledges give a lot to take back to institutions
- Owen: can we make the solutions more adoptable to the community?
- do we need to work on making integration easier
- do we need SIGs for the different tools or a workflow SIG?
- Maura: would be good for more libraries to get the benefit of the presentations
- maybe have a Slack channel wehre people can share tools and experiences
- we should avoid re-inveting the wheel again and again
- Martina S. should we ask Duke to present on workflow management with Trello
- yes, or other institutions using Trello
- or MS Teams
- to improve the overall experience I would ask you to answer the linked survey, that will only take a few minutes
- the survey will be open until Feb 22nd
Chat
18:13:00 Von Owen Stephens an Alle:
So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
18:18:50 Von Charlotte Whitt an Alle:
Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
18:19:40 Von Owen Stephens an Alle:
+1 to that question Charlotte
18:23:01 Von Owen Stephens an Alle:
Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
18:25:26 Von Owen Stephens an Alle:
Thanks Jeremy
18:25:40 Von Maura Byrne an Alle:
This is great.
18:25:53 Von Charlotte Whitt an Alle:
It looks really amazing
18:26:48 Von Charlotte Whitt an Alle:
Do you have a link to the GitHub documentation?
18:27:55 Von Jeremy Nelson an Alle:
https://github.com/sul-dlss/libsys-airflow
18:28:06 Von Charlotte Whitt an Alle:
Thanks a lot
18:28:46 Von Owen Stephens an Alle:
It seems not completely dissimilar to the demonstration from last time
18:29:14 Von Owen Stephens an Alle:
I’m just trying to remember what that was called :O
18:29:54 Von Owen Stephens an Alle:
Thanks Martina - yes it seems similar to Prefect
18:30:31 Von Kristin Martin an Alle:
https://folio-org.atlassian.net/wiki/display/REL/Team+vs+module+responsibility+matrix
18:45:10 Von Owen Stephens an Alle:
100% on avoiding re-inventing the wheel
18:47:28 Von Kristin Martin an Alle:
Or MS Teams
18:48:05 Von Owen Stephens an Alle:
The “learning apis” slack channel has questions that come up in similar way, but I like the idea of an “Automation approaches” channel
18:48:18 Von Maura Byrne an Alle:
+1
18:48:52 Von Owen Stephens an Alle:
In fact some of the questions that come up in Learning APIs are absolutely automation questions
18:49:28 Von Owen Stephens an Alle:
The discussion happening in there today is an example
Transcript
Future topics
- Topic proposal by Owen Stephens for October:
- Use of shortcut keys and macros for more effective cross-app working - it also be good to have UX and Stripes/dev knowledge for this discussion I think. I know @Laura (she/they) uses macros so might have insights into the potential for cross-app working
- Potential for external 'workflow' solutions for cross-app interactions
- I think 'workflow' is a dangerous term here - in this context it's more about automation than user workflows, although I think there is overlap
- I was particularly struck by the solution in production at TAMU (Jeremy Huff and Sebastian Hammer presented, the recording is at https://prod-zoom-recordings-openlibraryfoundation-org.s3.amazonaws.com/50dc6c87-3912-43fa-8287-56ec73b12bbb%2Fshared_screen_with_speaker_view%28CC%29.mp4 starting at 3 hrs, 14 min) - I think getting someone from TAMU to talk about how this is used would be v interesting
- There was also a presentation on the use of a tool called Airflow at Stanford for "bibliographic workflow" but I've not watched that yet so not 100% sure if it is completely applicable - I think the core use case there was systems migration but it may go beyond that
- Jenn Colt on using Prefect
- does not need to be workflow across apps
- UX/UI and implementers topics
- should be Wednesdays
- Comprehensive look at where data is copied and stored as opposed to live data | how it is represented
- Date filters and how they work in different apps
Attendees
Present | Name | Home Organization |
---|---|---|
Brooks Travis | EBSCO | |
x | Charlotte Whitt | Index Data |
Dennis Bridges | EBSCO | |
x | Dung-Lan Chen | Skidmore College |
Erin Nettifee | Duke | |
Gill Osguthorpe | UX/UI Designer - K-Int | |
Heather McMillan Thoele | TAMU | |
Ian Ibbotson | Developer Lead - K-Int | |
Jag Goraya | K-Int | |
x | Jana Freytag | VZG, Göttingen |
Jenn Colt | Cornell | |
Khalilah Gambrell | EBSCO | |
Kimberly Pamplin | TAMU | |
Kirstin Kemner-Heek | VZG, Göttingen | |
Kristin Martin | Chicago | |
Laura Daniels | Cornell | |
Lloyd Chittenden | Marmot Library Network | |
Marc Johnson | K-Int | |
x | Martina Schildt | VZG, Göttingen |
x | Martina Tumulla | hbz, Cologne |
x | Maura Byrne | Chicago |
Mike Gorrell | Index Data | |
x | Owen Stephens | Product Owner - Owen Stephens Consulting |
Patty Wanninger | Product owner Users app | |
Rachel A Sneed | TAMU | |
Sara Colglazier | Five Colleges / Mount Holyoke College Library | |
x | Susanne Schuster | BSZ Konstanz |
John Coburn | EBSCO | |
Zak Burke | EBSCO | |
Daniel Huang | Lehigh | |
x | Maccabee Levine | Lehigh |
Robert Scheier | Holy Cross | |
x | Jeremy Nelson | Stanford |
x | Ingolf Kuss | hbz |