Skip to end of banner
Go to start of banner

2022-02-25 - Sys Ops & Management SIG Agenda and Notes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Current »

Date

Attendees

Ingolf Kuss

Lisa Sjögren (EBSCO) 

Kyle Banerjee 

Tod Olson 

Hkaplanian 

jpnelson 

@Nils Olof Paulsson

Brandon Tharp 

Anton Emelianov (Deactivated) 

Steffen Köhler 

jroot 

Florian Gleixner 

Ian Walls 

Philip Robinson 

Goals

Discussion items

TimeItemWhoNotes

Find a note taker

30Data Migration tests with Apache Airflowjpnelson 

Apache Airflow by airbnb.

A  workflow for migrating marc records from Symphony. Using the Okapi inventory-storage,

DAG = Directed a... graph

Some libraries only import instances, not holdings and items

Calling transformers from folio-migration-tools

Converts to valid JSON

POST the records in bunches of 1,000

Each task in airflow has a log. You can set up retries.

DAG has env vars like FOLIO_USER/PASSWD, OKAPI_URL, ...

Most time is being spent by posting the records (although running in parallel)

We have 2 parallel posting instances.

The bottleneck is Okapi.

It is not like a Unix pipelining process

We get out of memories from Okapi if we increase the number of parallel processes.

Jeremy is migrating multiple libraries.

Phil: We are using Prefact. https://www.prefect.io

Are you using this for processing of data import ? We use it for collections. We use airflow to extract and populate our Solr or Blacklight indexes .

Ian: Allowing the the librarians to see what is happening here. To see them what is happening and make changes.

Lisa: What if you want to do part of these processes ? You can work with "failed statuses". DAG will stop and continue. You can re-run one step. We have an Alma, a Symphony and a FOLIO integration. All of those are being managed by airflow.

Jason: I have similar things going on with Vufind and our Worfklow engine - post Folio migration.

Ian: I found that my migration toolkit environment found itself living on post-migration to do data processing jobs

Code written in Python, bib_records.py. Transform csv to tsv. Custom Code is on "FOLIO Plugin". Using EBSCO FOLIO migration tools to do that work.

FOLIO is just a small plugin. There a plugins for many different system available; they are just given to you.

The mapping is being done by the EBSCO transformers.

https://github.com/FOLIO-FSE/folio-migration-tools + customer migration app has been checked out inside the airflowbnb Container. Mapping (holdings, items, instances) is in the migration app.

Lisa: And many of those reports that the migration scripts create are really created with the purpose of helping libraries understand their data -- what it is (many things can be surfaced at this stage), where it goes in FOLIO....

Tod: Fantastic to use the folio-migration-tools and NOT re-coding

Lisa: It looks great! And -- as Theodor is saying right now -- really exciting to see our tools used this way.

Jason: Awesome to see a container-driven local development environment. 🙂

Theodor: The customization of the migration tools are documented here: https://github.com/FOLIO-FSE/migration_repo_template

Jason: We have a similar thing and publish on a reporting site. ... I wish that would have been available by Apache sooner ! This looks pretty slick!   We did run Blacklight also and we run Vufind.

Tod: We are looking for some kind of environment ..., using Python code.

Jason: I like the ability to stop certain processes at a certain point.

Jeremy: We are currently looking at a remediation DAG.

Theodor: We are using deterministic UUIDs. 

Jason: One Okapi was getting overwhelmed (by the migration jobs). ... 3 would work o.k. on a "beefy" system


Action items

  •  


  • No labels