2022-02-25 - Sys Ops & Management SIG Agenda and Notes
Date
Attendees
@Nils Olof Paulsson
Theodor Tolstoy (One-Group.se)
Meeting Link
- https://zoom.us/j/591934220
- Password: folio-lsp
Goals
Discussion items
Time | Item | Who | Notes |
---|---|---|---|
Find a note taker | |||
30 | Data Migration tests with Apache Airflow | jpnelson | Apache Airflow by airbnb. A workflow for migrating marc records from Symphony. Using the Okapi inventory-storage, DAG = Directed a... graph Some libraries only import instances, not holdings and items Calling transformers from folio-migration-tools Converts to valid JSON POST the records in bunches of 1,000 Each task in airflow has a log. You can set up retries. DAG has env vars like FOLIO_USER/PASSWD, OKAPI_URL, ... Most time is being spent by posting the records (although running in parallel) We have 2 parallel posting instances. The bottleneck is Okapi. It is not like a Unix pipelining process We get out of memories from Okapi if we increase the number of parallel processes. Jeremy is migrating multiple libraries. Phil: We are using Prefact. https://www.prefect.io Are you using this for processing of data import ? We use it for collections. We use airflow to extract and populate our Solr or Blacklight indexes . Ian: Allowing the the librarians to see what is happening here. To see them what is happening and make changes. Lisa: What if you want to do part of these processes ? You can work with "failed statuses". DAG will stop and continue. You can re-run one step. We have an Alma, a Symphony and a FOLIO integration. All of those are being managed by airflow. Jason: I have similar things going on with Vufind and our Worfklow engine - post Folio migration. Ian: I found that my migration toolkit environment found itself living on post-migration to do data processing jobs Code written in Python, bib_records.py. Transform csv to tsv. Custom Code is on "FOLIO Plugin". Using EBSCO FOLIO migration tools to do that work. FOLIO is just a small plugin. There a plugins for many different system available; they are just given to you. The mapping is being done by the EBSCO transformers. https://github.com/FOLIO-FSE/folio-migration-tools + customer migration app has been checked out inside the airflowbnb Container. Mapping (holdings, items, instances) is in the migration app. Lisa: And many of those reports that the migration scripts create are really created with the purpose of helping libraries understand their data -- what it is (many things can be surfaced at this stage), where it goes in FOLIO.... Tod: Fantastic to use the folio-migration-tools and NOT re-coding Lisa: It looks great! And -- as Theodor is saying right now -- really exciting to see our tools used this way. Jason: Awesome to see a container-driven local development environment. 🙂 Theodor: The customization of the migration tools are documented here: https://github.com/FOLIO-FSE/migration_repo_template Jason: We have a similar thing and publish on a reporting site. ... I wish that would have been available by Apache sooner ! This looks pretty slick! We did run Blacklight also and we run Vufind. Tod: We are looking for some kind of environment ..., using Python code. Jason: I like the ability to stop certain processes at a certain point. Jeremy: We are currently looking at a remediation DAG. Theodor: We are using deterministic UUIDs. Jason: One Okapi was getting overwhelmed (by the migration jobs). ... 3 would work o.k. on a "beefy" system |