TC Subgroup: Distributed vs. Centralized Configuration
Subgroup Members:
Meetings: Monday 10-11 EST (16-17 German time)
Next meeting 31 Jul. 2023
Meeting room: https://meet.lrz.de/tcdistributedvscentralized or Zoom / Slack Huddle.
Links to former proposals and documents
- Distributed Configuration
- The old (and deprecated) centralized configuration module: https://github.com/folio-org/mod-configuration/blob/master/README.md
The journey to mod-settings
In chronological ordering:
Configuration and customization in FOLIO discusses the problems with mod-configuration and outlined possible approaches to resolving them (independent implementations, a Java library, a FOLIO module), tentatively concluding that fixes to mod-configuration might be the best approach.
Fixing the security problem in mod-configuration discusses in more detail what changes would be needed to mod-configuration in order for it to become a reasonable choice for configuration, making a concrete proposal and outlining how the old and new APIs would co-exist.
- The mod-settings README describes what actually got implemented: because mod-configuration is based on the outdated RMB foundation and contains a lot of obsolescent code, it turned out to be easier to leave it as it is for present (maintaining compatibility for code that uses it) and implement the new, secure, scope-based API in a new module instead.
Porting your module from mod-configuration to mod-settings is a detailed practical guide to switching from the old module to its replacement, based on Mike's experiences doing this with ui-ldp. Contains links to the updated code, which can act as an example for people porting other apps.
A user-interface for FOLIO customization is a proposal for how we could use centralized configuration to provide a means for librarians and administrators to configure FOLIO in a centralized manner.
Scope of the group /deliverables
This group will limit its remit to configuration stored within FOLIO itself.
- Settings stored in Okapi's /_/env APIs should be excluded.
Settings stored in Rancher's config maps and secrets should be excluded.
Settings stored in module container environment variables should be excluded.
Settings stored in the stripes front-end (stripes.config.js, etc.) should be excluded.
We should limit to settings back-end modules store in the database.
Questions to be answered:
- Shall tenant level configuration stored in a central module or distributed in modules.
- Pros / cons of approaches
- Level of relative effort to accomplish the solution, what is the transition way.
- Eventually give guidance when to use central or distributed configuration.
RFC or only a DR?
Types of configurations
System settings (the same for all Tenants, all modules)
Examples:
- Common Kafka URL and Access
- Common Elasticsearch URL and Access
- Common Database URL and Access
Module System/deployment settings
Examples:
- specific Kafka URL and Access
- specific Elasticsearch URL and Access
- specific Database URL and Access
- S3 Access
Tenant specific settings
Examples:
- Tenant Default Language
- Circulation Rules
- Locations
- Usergroups
Tenant and Module specific settings
Examples:
- KB Credentials in mod-kb-ebsco-java
Module specific settings (the same for all Tenants)
Examples:
- S3 Bucket Access
User/Usergroup specific settings
- User Default Language
- Circulation Desk
Configuration storage locations
Deployment Settings (central)
For deployment options like Ansible, Kubernetes/Helm.
Examples:
- Common Kafka Url
- Okapi Url
Module environment variables (distributed)
Examples:
- Kafka URL
- Database Access
Module configuration files (distributed)
Mounted at deploy time.
Examples:
- edge SIP2 configuration files
Module managed settings (distributed)
Module stores configuration in its database schema. Module may offer API endpoints to modify configuration
Examples:
- Circulation rules
Configuration Module managed settings (central)
Examples:
- mod-configuration
- Default tenant language
- mod-settings
- mod-service-interactions
- Dashboard settings
- thought as central settings module, but not used?
- no documentation at https://dev.folio.org/reference/api/ ?
Central vs. Distributed
Pros of central configuration:
- No need to implement API endpoints and storage for module specific configuration.
- All configuration variables of a tenant can be accessed for backup/clone/configuration of new tenant ... But: differentiating between configuration and data depends on point of view.
- Possibility for a central UI for configuration
- Can store configuration for values that have no "natural" owner module. Example: default tenant locale
- Can handle shared configuration, but do we need a owner module?
Drawback of mod-configuration:
- A big institution need config write permissions with module granularity. One member of staff may be allowed to edit circulation config but not acquisition config. Solved in mod-settings
Drawbacks of central configuration like mod-settings and mod-configuration
- No validation. The module cannot validate a POST or PUT request because it doesn't know a schema. Only the module it belongs to knows this. Relevant use case: Using curl/wget/postman/...
- No documentation. mod-configuration has no documentation, one needs to search, maybe the module's README has some? A dedicated module API always publishes the API documentation at https://dev.folio.org/reference/api/
- No explicit dependency. If more than one module uses a configuration this dependency should be make explicit with an interface dependency in the ModuleDescriptor.
- Performance. Requests to mod-configuration result in latency. If the config API belongs to the module the module can cache it and can invalidate the cache if the config is changed. Caching requests to mod-configuration will always result in a time period with outdated values. In mod-inventory-storage we've combined fetching the HRID config and HRID generation into a single SQL query.
- Coupling. Modules should be loosely coupled and therefore each module should store its own configs.
Pros of distributed configuration
- Offers possibility for write-only configuration values like passwords
Drawbacks of distributed configuration
- Adding configuration APIs adds interface dependencies and APIs are not consistent - need a guide for module configuration APIs
mod-configuration usage
Backend modules declaring mod-configuration usage: https://github.com/search?q=org%3Afolio-org+%22%5C%22configuration%5C%22%22+path%3A%2F%5Edescriptors%5C%2F%2F&type=code
Frontend modules declaring mod-configuration usage: https://github.com/search?q=org%3Afolio-org+%22\%22configuration\%22%22+path%3Apackage.json&type=code
Draft RFC bulletpoints
- mod-configuration shall not be used any more due to security issues
- module is already deprecated, no more configuration settings shall be added.
- Distributed configuration is more native in microsyervice ecosystems and has many pros.
- Distributed configuration shall be preferred
- if a configuration is used by more than one module, and one module is authoritive for this configuration, then this module should hold this configuration. (Example circulation rules)
- A consistant way getting/setting configurations should be established. API endpoints should look like /{module}/{configuration}/{entry} - Julian: Think this is not needed, maybe as a guideline, but not enforced
- Legacy endpoints getting/setting configuration shall get deprecated
- Centralized configuration using mod-settings can or should be used
- by non-sensitive information, that are used by more than one module or are complete independent of any module like locale settings
- settings that are specific to a user
- Configurations for multiple tenants: Craig will talk to Olaminde
Action Item: Formulate a Draft RFC. No meeting on 24 Jul. 23, Next meeting 31 Jul. 2023
Draft RFC
Introduction
FOLIO relies on a tenant aware extendable microservice system. It is highly configurable and adoptable on all kind of requirements. This RFC shall give a guideline for developers, where configurations shall be put.
Among all different kinds of configurations, this RFC does not deal with the following configurations, as their location and method to set/get them should not be changed:
Settings stored in Okapi's /_/env APIs.
Settings stored in Infrastructure (Kubernetes / Rancher) config maps and secrets.
Settings stored in module container environment variables.
Settings stored in the stripes front-end (stripes.config.js, etc.)
Decisions
mod-configuration will be dropped until Quesnelia release
mod-configuration is deprecated due to security problems since March 2022. It shall not be used any more to add new configuration variables. Modules still using mod-configuration have to move to other solutions until the Quesnelia release.
Distributed configuration is preferred
Distributed configuration means that each module stores its configuration values itself, and offers API endpoints to query and store these values. Distributed configuration in a microservice architecture has some advantages:
- The modules can validate the values according to format and dependencies
- Modules do not depend on a configuration module, hence a better separation of microservices can be achieved
- Since all API endpoints have to be documented, a basic documentation of possible configuration variables is mandatory
- Configuration values can be cached, since no other module can change values.
- Access to configuration values can effectively controlled by permissions defined in the module.
- Write-only configuration values are possible, like credentials. The module can offer other operators than reading values like comparing hashes (possible in central configuration too?)
- Modules can handle upgrade of configuration variable names or values during module upgrades more flexible
Even when there are also some drawbacks on distributed configuration, it is the preferred way to configure backend modules in FOLIO.
When to use central configuration
mod-settings solves the security problems of mod-configuration. It is the preferred module if configuration variables shall be stored centrally. It is not recommended to develop specialized modules for other central configuration store.
Centralized configuration can be used for:
- non-sensitive information, that are used by more than one module or are completely independent of any module like locale settings
- settings that are specific to a user
While these configurations can also be stored in a module, the developer can decide where these values shall be stored.
Migration
For locale properties and other properties still residing exclusively in mod-configuration, the access to these properties has to be moved to the mod-settings API until the Quesnelia release. Therefore a mod-configuration module offering only READ (and DELETE?) APIs will run in Quesnelia and the modules still using mod-configuration have to transfer their properties to mod-settings or to a distributed configuration.