2022-02-04 - Sys Ops & Management SIG Agenda and Notes

Date

Attendees

Ingolf Kuss  jroot  Florian Gleixner Florian Kreft @Niels Olof Paulsson Brandon Tharp jpnelson Chris Rutledge Philip Robinson

Goals

Discussion items

TimeItemWhoNotes

Find a note takerFlorian Gleixner

Experiences with production environmentJason

Went live January 10 2022. Migration went well after one weekend data migration and some days testing. Juniper Hotfix 3.


15 Nodes Rancher-provisioned Kubernetes Cluster
3 etcd Control Plane Nodes
Most Worker Nodes have 4 Cores, 16 GB Mem, some Nodes have 8 Cores, 24 GB Mem

Cluster is dedicated to Folio
Database Stack is on VMware outside Kubernetes with connection pooling and load-balancing with one master and 2 replicas. Asynchronous processes for replication and backups.
Decision not to run database in Kubernetes was because HA Postgres was not very well tested at time of decision, and database administrations best practice was using VMware virtualized DB hosts.

No outages so far (fantastic!), but some performance issues:
Regular searching in Inventory app is slow, users should use Inventory-ES (Elasticsearch).
Issues with Elasticsearch:

  • Initial index did not index all instances the first time
  • Corrupt Index forced to reindex again. Folio showed a "general search error", hard to trace down in logs.

Checkin/out, circulation tasks were slow. Many modules needed to scale to 5 Pods each, which helped with performance issues.
Reporting with LDP backend, and out-of-app MIS Reporting site using PHP, reports triggered via workflow engine.

Upgrade Plans: Deploy everything in the same namespace. Then upgrade Folio modules, no blue-green deployment needed for single tenant. Hint: Update Pubsub first!
TAMU uses a steering committee to decide which versions/updates have to be deployed.

Deployment is done using flat YAML files described in https://github.com/folio-org/folio-install/tree/tamu-r2-2021/alternative-install/kubernetes-rancher/TAMU
Eventually Ansible will be used in future for deployment CI/CD workflow.



Ingolf

Gone live with single server installation, Juniper without Elasticsearch.

Issue reported by users: https://folio-org.atlassian.net/browse/UIIN-1902

Switch to Elasticsearch using docker-compose is planned.



Jeremy Nelson

Doing migration tests. Using Apache's Airflow; building migration code from EBSCO. Each step has its own logging. Migration of 500,000 records and see how that handles it. Using some of EBSCO's classes . 

Uses Python for deployment. Jeremy can demonstrate the tool in one of the next meetings. Maybe in two weeks ?


Action items

  •