2025-06-04 Sys Ops & Management SIG Agenda and Meeting Notes

2025-06-04 Sys Ops & Management SIG Agenda and Meeting Notes

Translator

Date and time

Jun 4, 2025 12 EST

Zoom link

https://openlibraryfoundation.zoom.us/j/591934220?pwd=dXhuVFZoSllHU09qamZoZzZiTWhmQT09

Topics



Attendees

  •  

    • @Ingolf Kuss

    • @Shelley Doljack

    • @Florian Kreft

    • @Florian Gleixner

Time

Item

Who

Notes



Time

Item

Who

Notes





Welcome



Shelley’s group about to upgrade to Ransoms beginning of July.





FOLIO Multitenant Upgrade Problem

@Florian Gleixner

All

https://folio-org.atlassian.net/wiki/external/OGNlOTNhODgyZmM2NDA4NWI0YmUzOTU5MzcxOGJmMWU

  • Problem description

  • Things that should change (Proposal)

  • Discussion

Meeting Notes:

The group wants to create a separate RFC for this.

This issue strictly speaking only affects multi tenant hostings. But if one changes the ENV variable from release to release (and this can make sense even in a single tenant hosting), then it also affects this implementation.

Ingolf has run into similar issues as Florian Gl. . Kafka messages can not be consumed by some modules (esp. mod-quick-marc, but also mod-search, mod-inventory-storage, others) during release upgrades. Log messages of those modules spam the Okapi log. Logs have to be erased every 15 minutes, otherwise the system would run out of disk space during the upgrade.

Problem with the Elasticsearch / Opensearch index : after release upgrade, the index is not anymore accessible. One needs to re-create it.

Proposal to use different a ENV variable for Kafka and Elasticsearch or to use a Prefix for the env variable.

  • careful consideration of upgrade situation ; document, what a system operator needs to do, when the name of an environment variable has been changed from one release to another.





Eureka variable naming conventions & naming (in)consistensies

@Kevin Day

All

Do we need to create additional Jira Tickets for these ?

Today’s Meeting notes:

We created a new RFC: 0012-Folio Deploy Environment Variable Naming Conventions


Last times notes:

Question to System Admins:

Where are the pain points ? What problems have system admins encountered ?

ENV variable. Kafka topics pull that value in. Elasticsearch index also. ES requires a reindex with the new value. Unexpected dependency of the ENV variable value. Re-Indexing takes a lot of time for bigger libraries. For a multitenant system, setting ENV is required.

ENV = quesnelia

Maybe the ENV variable should be named better. Maybe it should be only for Kafka. It is not good to use the ENV variable for two purposes.

Maybe the purpose it that you could have multiple instances with the same tenant ID → ask the developers.

Upgrades get broken if you don't change the value of ENV between releases on a muti-tenant system.


env variables with dots: e.g.

okapi.url , tenant.url, kong.url  , am.url
that is inconsistent to environment variables naming schemes of docker containers. In Linux, this can cause errors.
Allowed standards: Uppercase letters, digits, underscore

We recommend following the IEEE standards: https://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap08.html


ERM modules take very long to spin up (in Quesnelia).

Many modules have to be adapted for more memory. mod-agreements needed to raise memory exteremely for 15 tenants; using gokb.

Many out-of-memory events and heap-space events.



For mod-fqm-manager, there are lowercase variables with dots.

Snapshot environments. The sample data are old and are erased from day to day.









Eureka Test Installation Experiences

All

touched Vault issue - why using Vault ? What is the reason of using another Secret Store in addition to Kubernetes Secrets ? How to use Eureka without Vault ?





Chat Log





















Topics for next meetings



  • Eureka Testing Installation

  • Problems with RTR that Ingolf and colleagues encountered

users sometimes get kicked out unexpectedly, earlier than the expiration period. Ingolf has captured log files for this failure.

  •  

    • Cleared up to >90%. Working with the same user in different browser windows causes the kick out. It's not a bug, it's a feature.



Action items

Type your task here, using "@" to assign to a user and "//" to select a due date