Date
Attendees
@Nils Olof Paulsson
Meeting Link
- https://zoom.us/j/591934220
- Password: folio-lsp
Goals
Discussion items
Time | Item | Who | Notes |
---|---|---|---|
Find a note taker | |||
30 | Data Migration tests with Apache Airflow | jpnelson | Apache Airflow by airbnb. A workflow for migrating marc records from Symphony. Using the Okapi inventory-storage, DAG = Directed a... graph Some libraries only import instances, not holdings and items Calling transformers from folio-migration-tools Converts to valid JSON POST the records in bunches of 1,000 Each task in airflow has a log. You can set up retries. DAG has env vars like FOLIO_USER/PASSWD, OKAPI_URL, ... Most time is being spent by posting the records (although running in parallel) We have 2 parallel posting instances. The bottleneck is Okapi. It is not like a Unix pipelining process We get out of memories from Okapi if we increase the number of parallel processes. Jeremy is migrating multiple libraries. Phil: We are using Prefact. Are you using this for processing of data import ? We use it for collections. We use airflow to extract and populate our Solr or Blacklight indexes . Ian: Allowing the the librarians to see what is happening here. To see them what is happening and make changes. Lisa: What if you want to do part of these processes ? You can work with "failed statuses". DAG will stop and continue. You can re-run one step. We have an Alma, a Symphony and a FOLIO integration. All of those are being managed by airflow. Jason: I have similar things going on with Vufind and our Worfklow engine - post Folio migration. Ian: I found that my migration toolkit environment found itself living on post-migration to do data processing jobs |