2023-02-08 Meeting notes: Airflow for bibliographic workflows

2023-02-08 Meeting notes: Airflow for bibliographic workflows

Date

Feb 8, 2023

Housekeeping

  • Convener and notes: @Martina Schildt

  • Next meeting: Feb 13, 2023

Discussion items

  1. Airflow for "bibliographic workflows" | @jpnelson at Stanford | until 12:30 PM ET

  2. AI SIG survey

es

Airflow for "bibliographic workflow"

  • open source project

  • widely used - often in big data or machine learning

  • used airflow for other workflows before adopting for FOLIO

  • for FOLIO used it for data migration - mainly to support bibliographic workflow

  • about 10 million MARC records from previous ILS

  • what does workflow mean in the context: airflow construct a workflow as discrete tasks

  • consists of DAGs | DAG: Directed Acyclic Graph (one direction = non-looping code) which complete a single specific task

  • tasks are grouped into task groups

  • instances are posted before holdings and then items

  • when finished → running a separate process → check whether all records are went into FOLIO

  • Owen in chat: So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?

    • Jeremy: kind of, a DAG is a separate workflow, different DAGs are connected for migration

  • Charlotte in chat: Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?

    • follows - min. ~22

  • Owen in chat: Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)

    • Jeremy: yes

  • Charlotte in chat: Do you have a link to the GitHub documentation?

  • Martina S: what can we do with the gained knowledge?

    • Owen: some similarity between some of the tools

    • describe different scenarios and why the tools work seems important

    • e.g. why is a dataflow tool like airflow useful for data loading

    • what are the differences and what are the similarities

  • building structured data flows vs. using an external tool to integrate with FOLIO (request demo)

  • Maura: different tools for different kinds of processes

    • presentations and knowledges give a lot to take back to institutions

  • Owen: can we make the solutions more adoptable to the community?

    • do we need to work on making integration easier

    • do we need SIGs for the different tools or a workflow SIG?

  • Maura: would be good for more libraries to get the benefit of the presentations

    • maybe have a Slack channel wehre people can share tools and experiences

    • we should avoid re-inveting the wheel again and again

  • Martina S. should we ask Duke to present on workflow management with Trello 

    • yes, or other institutions using Trello

    • or MS Teams

AI SIG survey

  • to improve the overall experience I would ask you to answer the linked survey, that will only take a few minutes

  • the survey will be open until Feb 22nd

Chat

18:13:00 Von  Owen Stephens  an  Alle:
    So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
18:18:50 Von  Charlotte Whitt  an  Alle:
    Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
18:19:40 Von  Owen Stephens  an  Alle:
    +1 to that question Charlotte
18:23:01 Von  Owen Stephens  an  Alle:
    Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
18:25:26 Von  Owen Stephens  an  Alle:
    Thanks Jeremy
18:25:40 Von  Maura Byrne  an  Alle:
    This is great.
18:25:53 Von  Charlotte Whitt  an  Alle:
    It looks really amazing
18:26:48 Von  Charlotte Whitt  an  Alle:
    Do you have a link to the GitHub documentation?
18:27:55 Von  Jeremy Nelson  an  Alle:
    https://github.com/sul-dlss/libsys-airflow
18:28:06 Von  Charlotte Whitt  an  Alle:
    Thanks a lot
18:28:46 Von  Owen Stephens  an  Alle:
    It seems not completely dissimilar to the demonstration from last time
18:29:14 Von  Owen Stephens  an  Alle:
    I’m just trying to remember what that was called :O
18:29:54 Von  Owen Stephens  an  Alle:
    Thanks Martina - yes it seems similar to Prefect
18:30:31 Von  Kristin Martin  an  Alle:
    https://folio-org.atlassian.net/wiki/display/REL/Team+vs+module+responsibility+matrix
18:45:10 Von  Owen Stephens  an  Alle:
    100% on avoiding re-inventing the wheel
18:47:28 Von  Kristin Martin  an  Alle:
    Or MS Teams
18:48:05 Von  Owen Stephens  an  Alle:
    The “learning apis” slack channel has questions that come up in similar way, but I like the idea of an “Automation approaches” channel
18:48:18 Von  Maura Byrne  an  Alle:
    +1
18:48:52 Von  Owen Stephens  an  Alle:
    In fact some of the questions that come up in Learning APIs are absolutely automation questions
18:49:28 Von  Owen Stephens  an  Alle:
    The discussion happening in there today is an example

Transcript

Future topics

  • Topic proposal by @Owen Stephens for October:

    • Use of shortcut keys and macros for more effective cross-app working  - it also be good to have UX and Stripes/dev knowledge for this discussion I think. I know @Laura (she/they) uses macros so might have insights into the potential for cross-app working

    • Potential for external 'workflow' solutions for cross-app interactions

      • I think 'workflow' is a dangerous term here - in this context it's more about automation than user workflows, although I think there is overlap

      • I was particularly struck by the solution in production at TAMU (Jeremy Huff and Sebastian Hammer presented, the recording is at https://prod-zoom-recordings-openlibraryfoundation-org.s3.amazonaws.com/50dc6c87-3912-43fa-8287-56ec73b12bbb%2Fshared_screen_with_speaker_view%28CC%29.mp4 starting at 3 hrs, 14 min) - I think getting someone from TAMU to talk about how this is used would be v interesting

      • There was also a presentation on the use of a tool called Airflow at Stanford for "bibliographic workflow" but I've not watched that yet so not 100% sure if it is completely applicable - I think the core use case there was systems migration but it may go beyond that

      • Jenn Colt on using Prefect

      • does not need to be workflow across apps

  • UX/UI and implementers topics

    • should be Wednesdays

  • Comprehensive look at where data is copied and stored as opposed to live data | how it is represented

  • Date filters and how they work in different apps

Attendees

Present

Name

Home Organization

Present

Name

Home Organization



Brooks Travis

EBSCO

x

Charlotte Whitt

Index Data



Dennis Bridges

EBSCO

x

Dung-Lan Chen

Skidmore College



Erin Nettifee

Duke



Gill Osguthorpe

UX/UI Designer - K-Int



Heather McMillan Thoele

TAMU



Ian Ibbotson

Developer Lead - K-Int



Jag Goraya

K-Int

x

Jana Freytag

VZG, Göttingen



Jenn Colt

Cornell



Khalilah Gambrell

EBSCO



Kimberly Pamplin

TAMU



Kirstin Kemner-Heek  

VZG, Göttingen



Kristin Martin

Chicago



Laura Daniels

Cornell



Lloyd Chittenden

Marmot Library Network



Marc Johnson

K-Int

x

Martina Schildt

VZG, Göttingen

x

Martina Tumulla

hbz, Cologne

x

Maura Byrne

Chicago



Mike Gorrell

Index Data

x

Owen Stephens

Product Owner -  Owen Stephens Consulting



Patty Wanninger

Product owner Users app



Rachel A Sneed

TAMU



Sara Colglazier

Five Colleges / Mount Holyoke College Library

x

Susanne Schuster

BSZ Konstanz



John Coburn

EBSCO



Zak Burke

EBSCO



Daniel Huang

Lehigh

x

Maccabee Levine

Lehigh



Robert Scheier

Holy Cross

x

Jeremy Nelson

Stanford

x

Ingolf Kuss

hbz

Action items