TC Subgroup: Distributed vs. Centralized Configuration


Subgroup Members:

Florian Gleixner 

Craig McNally 

Julian Ladisch 

Mike Taylor 

Meetings: Monday 10-11 EST (16-17 German time)

Next meeting 31 Jul. 2023

Meeting room: https://meet.lrz.de/tcdistributedvscentralized or Zoom / Slack Huddle.


Links to former proposals and documents

The journey to mod-settings

In chronological ordering:

  • Configuration and customization in FOLIO discusses the problems with mod-configuration and outlined possible approaches to resolving them (independent implementations, a Java library, a FOLIO module), tentatively concluding that fixes to mod-configuration might be the best approach.

  • Fixing the security problem in mod-configuration discusses in more detail what changes would be needed to mod-configuration in order for it to become a reasonable choice for configuration, making a concrete proposal and outlining how the old and new APIs would co-exist.

  • The mod-settings README describes what actually got implemented: because mod-configuration is based on the outdated RMB foundation and contains a lot of obsolescent code, it turned out to be easier to leave it as it is for present (maintaining compatibility for code that uses it) and implement the new, secure, scope-based API in a new module instead.
  • Porting your module from mod-configuration to mod-settings is a detailed practical guide to switching from the old module to its replacement, based on Mike's experiences doing this with ui-ldp. Contains links to the updated code, which can act as an example for people porting other apps.

  • A user-interface for FOLIO customization is a proposal for how we could use centralized configuration to provide a means for librarians and administrators to configure FOLIO in a centralized manner.

Scope of the group /deliverables

This group will limit its remit to configuration stored within FOLIO itself.

  • Settings stored in Okapi's /_/env APIs should be excluded.
    Settings stored in Rancher's config maps and secrets should be excluded.
    Settings stored in module container environment variables should be excluded.
    Settings stored in the stripes front-end (stripes.config.js, etc.) should be excluded.
    We should limit to settings back-end modules store in the database.

Questions to be answered:

  • Shall tenant level configuration stored in a central module or distributed in modules.
  • Pros / cons of approaches
  • Level of relative effort to accomplish the solution, what is the transition way.
  • Eventually give guidance when to use central or distributed configuration.

RFC or only a DR?

Types of configurations

System settings (the same for all Tenants, all modules)

Examples:

  • Common Kafka URL and Access
  • Common Elasticsearch URL and Access
  • Common Database URL and Access

Module System/deployment settings

Examples:

  • specific Kafka URL and Access
  • specific Elasticsearch URL and Access
  • specific Database URL and Access
  • S3 Access


Tenant specific settings

Examples:

  • Tenant Default Language
  • Circulation Rules
  • Locations
  • Usergroups

Tenant and Module specific settings

Examples:

  • KB Credentials in mod-kb-ebsco-java

Module specific settings (the same for all Tenants)

Examples:

  • S3 Bucket Access

User/Usergroup specific settings

  • User Default Language
  • Circulation Desk

Configuration storage locations

Deployment Settings (central)

For deployment options like Ansible, Kubernetes/Helm.

Examples:

  • Common Kafka Url
  • Okapi Url

Module environment variables (distributed)

Examples:

  • Kafka URL
  • Database Access

Module configuration files (distributed)

Mounted at deploy time.

Examples:

  • edge SIP2 configuration files

Module managed settings (distributed)

Module stores configuration in its database schema. Module may offer API endpoints to modify configuration

Examples:

  • Circulation rules


Configuration Module managed settings (central)

Examples:

Central vs. Distributed

Pros of central configuration:

  • No need to implement API endpoints and storage for module specific configuration.
  • All configuration variables of a tenant can be accessed for backup/clone/configuration of new tenant ... But: differentiating between configuration and data depends on point of view.
  • Possibility for a central UI for configuration
  • Can store configuration for values that have no "natural" owner module. Example: default tenant locale
  • Can handle shared configuration, but do we need a owner module?

Drawback of mod-configuration:

  • A big institution need config write permissions with module granularity. One member of staff may be allowed to edit circulation config but not acquisition config. Solved in mod-settings

Drawbacks of central configuration like mod-settings and mod-configuration

  • No validation. The module cannot validate a POST or PUT request because it doesn't know a schema. Only the module it belongs to knows this. Relevant use case: Using curl/wget/postman/...
  • No documentation. mod-configuration has no documentation, one needs to search, maybe the module's README has some? A dedicated module API always publishes the API documentation at https://dev.folio.org/reference/api/
  • No explicit dependency. If more than one module uses a configuration this dependency should be make explicit with an interface dependency in the ModuleDescriptor.
  • Performance. Requests to mod-configuration result in latency. If the config API belongs to the module the module can cache it and can invalidate the cache if the config is changed. Caching requests to mod-configuration will always result in a time period with outdated values. In mod-inventory-storage we've combined fetching the HRID config and HRID generation into a single SQL query.
  • Coupling. Modules should be loosely coupled and therefore each module should store its own configs.

Pros of distributed configuration

  • Offers possibility for write-only configuration values like passwords

Drawbacks of distributed configuration

  • Adding configuration APIs adds interface dependencies and APIs are not consistent - need a guide for module configuration APIs


mod-configuration usage

Backend modules declaring mod-configuration usage: https://github.com/search?q=org%3Afolio-org+%22%5C%22configuration%5C%22%22+path%3A%2F%5Edescriptors%5C%2F%2F&type=code

Frontend modules declaring mod-configuration usage: https://github.com/search?q=org%3Afolio-org+%22\%22configuration\%22%22+path%3Apackage.json&type=code

Draft RFC bulletpoints

  • mod-configuration shall not be used any more due to security issues
    • module is already deprecated, no more configuration settings shall be added.
  • Distributed configuration is more native in microsyervice ecosystems and has many pros.
    • Distributed configuration shall be preferred
    • if a configuration is used by more than one module, and one module is authoritive for this configuration, then this module should hold this configuration. (Example circulation rules)
    • A consistant way getting/setting configurations should be established. API endpoints should look like /{module}/{configuration}/{entry} - Julian: Think this is not needed, maybe as a guideline, but not enforced
      • Legacy endpoints getting/setting configuration shall get deprecated
  • Centralized configuration using mod-settings can or should be used
    • by non-sensitive information, that are used by more than one module or are complete independent of any module like locale settings
    • settings that are specific to a user
  • Configurations for multiple tenants: Craig will talk to Olaminde


Action Item: Formulate a Draft RFC. No meeting on 24 Jul. 23, Next meeting 31 Jul. 2023


Draft RFC

Introduction

FOLIO relies on a tenant aware extendable microservice system. It is highly configurable and adoptable on all kind of requirements. This RFC shall give a guideline for developers, where configurations shall be put.

Among all different kinds of configurations, this RFC does not deal with the following configurations, as their location and method to set/get them should not be changed:

  • Settings stored in Okapi's /_/env APIs.

  • Settings stored in Infrastructure (Kubernetes / Rancher) config maps and secrets.

  • Settings stored in module container environment variables.

  • Settings stored in the stripes front-end (stripes.config.js, etc.)

Decisions

mod-configuration will be dropped until Quesnelia release

mod-configuration is deprecated due to security problems since March 2022. It shall not be used any more to add new configuration variables. Modules still using mod-configuration have to move to other solutions until the Quesnelia release.

Distributed configuration is preferred

Distributed configuration means that each module stores its configuration values itself, and offers API endpoints to query and store these values. Distributed configuration in a microservice architecture has some advantages:

  • The modules can validate the values according to format and dependencies
  • Modules do not depend on a configuration module, hence a better separation of microservices can be achieved
  • Since all API endpoints have to be documented, a basic documentation of possible configuration variables is mandatory
  • Configuration values can be cached, since no other module can change values.
  • Access to configuration values can effectively controlled by permissions defined in the module.
  • Write-only configuration values are possible, like credentials. The module can offer other operators than reading values like comparing hashes (possible in central configuration too?)
  • Modules can handle upgrade of configuration variable names or values during module upgrades more flexible

Even when there are also some drawbacks on distributed configuration, it is the preferred way to configure backend modules in FOLIO.

When to use central configuration

mod-settings solves the security problems of mod-configuration. It is the preferred module if configuration variables shall be stored centrally. It is not recommended to develop specialized modules for other central configuration store.

Centralized configuration can be used for:

  • non-sensitive information, that are used by more than one module or are completely independent of any module like locale settings
  • settings that are specific to a user

While these configurations can also be stored in a module, the developer can decide where these values shall be stored.

Migration

For locale properties and other properties still residing exclusively in mod-configuration, the access to these properties has to be moved to the mod-settings API until the Quesnelia release. Therefore a mod-configuration module offering only READ (and DELETE?) APIs will run in Quesnelia and the modules still using mod-configuration have to transfer their properties to mod-settings or to a distributed configuration.