2022-02-25 - Sys Ops & Management SIG Agenda and Notes

Date

Attendees

Ingolf Kuss

Lisa Sjögren (EBSCO) 

Kyle Banerjee 

Tod Olson 

Hkaplanian 

jpnelson 

@Nils Olof Paulsson

Brandon Tharp 

Anton Emelianov (Deactivated) 

Steffen Köhler 

jroot 

Florian Gleixner 

Ian Walls 

Philip Robinson 

Theodor Tolstoy (One-Group.se) 

Goals

Discussion items

TimeItemWhoNotes

Find a note taker

30Data Migration tests with Apache Airflowjpnelson 

Apache Airflow by airbnb.

A  workflow for migrating marc records from Symphony. Using the Okapi inventory-storage,

DAG = Directed a... graph

Some libraries only import instances, not holdings and items

Calling transformers from folio-migration-tools

Converts to valid JSON

POST the records in bunches of 1,000

Each task in airflow has a log. You can set up retries.

DAG has env vars like FOLIO_USER/PASSWD, OKAPI_URL, ...

Most time is being spent by posting the records (although running in parallel)

We have 2 parallel posting instances.

The bottleneck is Okapi.

It is not like a Unix pipelining process

We get out of memories from Okapi if we increase the number of parallel processes.

Jeremy is migrating multiple libraries.

Phil: We are using Prefact. https://www.prefect.io

Are you using this for processing of data import ? We use it for collections. We use airflow to extract and populate our Solr or Blacklight indexes .

Ian: Allowing the the librarians to see what is happening here. To see them what is happening and make changes.

Lisa: What if you want to do part of these processes ? You can work with "failed statuses". DAG will stop and continue. You can re-run one step. We have an Alma, a Symphony and a FOLIO integration. All of those are being managed by airflow.

Jason: I have similar things going on with Vufind and our Worfklow engine - post Folio migration.

Ian: I found that my migration toolkit environment found itself living on post-migration to do data processing jobs

Code written in Python, bib_records.py. Transform csv to tsv. Custom Code is on "FOLIO Plugin". Using EBSCO FOLIO migration tools to do that work.

FOLIO is just a small plugin. There a plugins for many different system available; they are just given to you.

The mapping is being done by the EBSCO transformers.

https://github.com/FOLIO-FSE/folio-migration-tools + customer migration app has been checked out inside the airflowbnb Container. Mapping (holdings, items, instances) is in the migration app.

Lisa: And many of those reports that the migration scripts create are really created with the purpose of helping libraries understand their data -- what it is (many things can be surfaced at this stage), where it goes in FOLIO....

Tod: Fantastic to use the folio-migration-tools and NOT re-coding

Lisa: It looks great! And -- as Theodor is saying right now -- really exciting to see our tools used this way.

Jason: Awesome to see a container-driven local development environment. 🙂

Theodor: The customization of the migration tools are documented here: https://github.com/FOLIO-FSE/migration_repo_template

Jason: We have a similar thing and publish on a reporting site. ... I wish that would have been available by Apache sooner ! This looks pretty slick!   We did run Blacklight also and we run Vufind.

Tod: We are looking for some kind of environment ..., using Python code.

Jason: I like the ability to stop certain processes at a certain point.

Jeremy: We are currently looking at a remediation DAG.

Theodor: We are using deterministic UUIDs. 

Jason: One Okapi was getting overwhelmed (by the migration jobs). ... 3 would work o.k. on a "beefy" system


Action items

  •