Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Steffen Köhler 

jroot 

Florian Gleixner 

Ian Walls 

Philip Robinson 

...

TimeItemWhoNotes

Find a note taker

30Data Migration tests with Apache Airflowjpnelson 

Apache Airflow by airbnb.

A  workflow for migrating marc records from Symphony. Using the Okapi inventory-storage,

DAG = directed Directed a... graph

Some libraries only import instances, not holdings and items

Calling transformers from folio-migration-tools

Converts to valid JSON

POST the records in bunches of 1,000

Each task in airflow has a log. You can set up retries.

DAG has env vars like FOLIO_USER/PASSWD, OKAPI_URL, ...

Most time is being spent by posting the records (although running in parallel)

We have 2 parallel posting instances.

The bottleneck is Okapi.

It is not like a Unix pipelining process

We get out of memories from Okapi if we increase the number of parallel processes.

Jeremy is migrating multiple libraries.

Phil: We are using Prefact. 

Are you using this for processing of data import ? We use it for collections. We use airflow to extract and populate our Solr or Blacklight indexes .

Ian: Allowing the the librarians to see what is happening here. To see them what is happening and make changes.

Lisa: What if you want to do part of these processes ? You can work with "failed statuses". DAG will stop and continue. You can re-run one step. We have an Alma, a Symphony and a FOLIO integration. All of those are being managed by airflow.

Jason: I have similar things going on with Vufind and our Worfklow engine - post Folio migration.

Ian: I found that my migration toolkit environment found itself living on post-migration to do data processing jobs


Action items

  •