Kubernetes Example Deployment

Overview

Right in the beginning of a long way we would highly recommend to become familiar with the Folio Eureka Platform Overview document

to be aware of main concepts for the new platform.

Setting Up the Environment

Prerequisites:

  • Kubernetes Cluster (system for automating deployment, scaling, and management of containerized applications)

  • PostgreSQL (RDBMS used by Keycloak, Kong Gateway, Eureka modules)

  • Apache Kafka (distributed event streaming platform)

  • HashiCorp Vault (identity-based secret and encryption management system)

  • Keycloak (Identity and Access Management)

  • Kong Gateway (API Gateway)

  • MinIO (Enterprise Object Store is built for production environments, OPTIONAL)

  • Elasticsearch or OpenSearch(enterprise-grade search and observability suite)

 

MinIO is implementation of Object Storage compatible with AWS S3 service.

It also works the other way around instead of MinIO you are free to use AWS S3 service without any problem.

 

To set up Eureka Platform you should already have Kubernetes Cluster installed. Then just create a new Namespace within K8s Cluster to assign and manage resources granularity for your Eureka deployment.

You can have your cluster nodes on premise in local data center or adopt any cloud provider (i.e. AWS, Azure, GCP and so on) most suitable for you to meet planned or not planned resource demand.

Eureka Platform depends on a bunch of 3rd party services (listed above) for its expected operation. Some of these services (PostgreSQL, Apache Kafka, OpenSearch, Hashicorp Vault) can be deployed as standalone servces outside of cluster namespace but others mostly never depoloyed outside.

For initial Eureka deployment you will need about 30Gb of RAM. Such setup incorporates all mentioned 3rd party services in one kubernetes namespace.

It may require some extra resources (RAM, CPU, HDD Disk Space, HDD IOPS) to be assigned to destination Kubernetes Cluster in case prerequisites services are deployed in to the same cluster namespace.

Also in case you are going to have Consortia deployment it also needs extra resources to be assigned.

In case you make decision to have everything in one place please pay attention for HDD IOPS required by PostgreSQL/OpenSearch/ApacheKafka services.

 

PostgreSQL RDBMS should be installed to cluster namespace first since its the prerequisite for Kong Gateway and Keycloak Identity Manager.

Apache Kafka service is used by Eureka for internal communication between modules and very important to keep it in a good shape.

HashiCorp Vault stores all secrets used within Platform. AWS SSM Parameters are also supported as secrets' storage now.

Keycloak service provides authentication and authorization (granting access) for any kind of identities (users, roles, endpoints).

Kong Gateway as API Gateway routes requests to modules and provides access to Eureka REST APIs.

MinIO object storage keeps data for some modules to be used during platform operation.

Elasticsearch instance contains huge amount of information and indexes it for a fast search. It is very important to look after appropriate level of performance for this service. Also can be installed outside of Kubernetes Cluster.

 

Expected Prerequisites deployment order:

  1. Hashicorp Vault

  2. PostgreSQL

  3. Apache Kafka

  4. ElasticSearch

  5. MinIO (Optional)

  6. Kong Gateway

  7. Keycloak Identity Manager

Cluster setup

Lets assume you are going to set up Eureka Platform development environment on Kubernetes Cluster. To meet resource scalability ease during workload spikes it worth to use Cloud Services like EKS (AWS), AKS (Azure), GKE (GCP).

In the same time to control cloud vendor lock and cut down expences we are going to deploy all prerequisite services into the one cluster namespace except OpenSearch instance :)

To deploy prerequisite services we would recommend to adopt following Container (Docker) Images and Helm Charts:

PostgreSQL container Image: hub.docker.com/bitnami/postgresql , Helm Chart: github.com/bitnami/charts/postgresql

architecture: standalone readReplicas: replicaCount: 1 resources: requests: memory: 8192Mi limits: memory: 10240Mi podAffinityPreset: soft persistence: enabled: true size: '20Gi' storageClass: gp2 extendedConfiguration: |- shared_buffers = '2560MB' max_connections = '500' listen_addresses = '0.0.0.0' effective_cache_size = '7680MB' maintenance_work_mem = '640MB' checkpoint_completion_target = '0.9' wal_buffers = '16MB' default_statistics_target = '100' random_page_cost = '1.1' effective_io_concurrency = '200' work_mem = '1310kB' min_wal_size = '1GB' max_wal_size = '4GB' image: tag: 13.13.0 auth: database: folio postgresPassword: secretDBpassword replicationPassword: secretDBpassword replicationUsername: postgres usePasswordFiles: false primary: initdb: scripts: init.sql: | CREATE DATABASE kong; CREATE USER kong PASSWORD 'secretDBpassword'; ALTER DATABASE kong OWNER TO kong; ALTER DATABASE kong SET search_path TO public; REVOKE CREATE ON SCHEMA public FROM public; GRANT ALL ON SCHEMA public TO kong; GRANT USAGE ON SCHEMA public TO kong; CREATE DATABASE keycloak; CREATE USER keycloak PASSWORD 'secretDBpassword'; ALTER DATABASE keycloak OWNER TO keycloak; ALTER DATABASE keycloak SET search_path TO public; REVOKE CREATE ON SCHEMA public FROM public; GRANT ALL ON SCHEMA public TO keycloak; GRANT USAGE ON SCHEMA public TO keycloak; CREATE DATABASE ldp; CREATE USER ldpadmin PASSWORD 'someLdpPassword'; CREATE USER ldpconfig PASSWORD 'someLdpPassword'; CREATE USER ldp PASSWORD 'someLdpPassword'; ALTER DATABASE ldp OWNER TO ldpadmin; ALTER DATABASE ldp SET search_path TO public; REVOKE CREATE ON SCHEMA public FROM public; GRANT ALL ON SCHEMA public TO ldpadmin; GRANT USAGE ON SCHEMA public TO ldpconfig; GRANT USAGE ON SCHEMA public TO ldp; persistence: enabled: true size: '20Gi' storageClass: gp2 resources: requests: memory: 8192Mi limits: memory: 10240Mi podSecurityContext: fsGroup: 1001 containerSecurityContext: runAsUser: 1001 podAffinityPreset: soft extendedConfiguration: |- shared_buffers = '2560MB' max_connections = '5000' listen_addresses = '0.0.0.0' effective_cache_size = '7680MB' maintenance_work_mem = '640MB' checkpoint_completion_target = '0.9' wal_buffers = '16MB' default_statistics_target = '100' random_page_cost = '1.1' effective_io_concurrency = '200' work_mem = '1310kB' min_wal_size = '1GB' max_wal_size = '4GB' volumePermissions: enabled: true metrics: enabled: false resources: requests: memory: 1024Mi limits: memory: 3072Mi serviceMonitor: enabled: true namespace: monitoring interval: 30s scrapeTimeout: 30s

Apache Kafka container Image: hub.docker.com/bitnami/kafka, Helm Chart: github.com/bitnami/charts/kafka

image: tag: 3.5 metrics: kafka: enabled: true resources: limits: memory: 1280Mi requests: memory: 256Mi jmx: enabled: true resources: limits: memory: 2048Mi requests: memory: 1024Mi serviceMonitor: enabled: true namespace: monitoring interval: 30s scrapeTimeout: 30s persistence: enabled: true size: 10Gi storageClass: gp2 resources: requests: memory: 2Gi limits: memory: 8192Mi zookeeper: image: tag: 3.7 enabled: true persistence: size: 5Gi resources: requests: memory: 512Mi limits: memory: 768Mi livenessProbe: enabled: false readinessProbe: enabled: false replicaCount: 1 heapOpts: "-XX:MaxRAMPercentage=75.0" extraEnvVars: - name: KAFKA_DELETE_TOPIC_ENABLE value: "true"

Hashicorp Vault container Image: hub.docker.com/bitnami/kafka, Helm Chart: github.com/bitnami/charts/vault

global: enabled: true server: ingress: enabled: false dev: enabled: true ha: enabled: false service: type: ClusterIP port: 8200 dataStorage: enabled: true tls: enabled: false auto: enabled: false extraEnvironmentVars: VAULT_DEV_ROOT_TOKEN_ID: "root" resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "1Gi" cpu: "1024m" backup: enabled: false logLevel: "debug" dataStorage: enabled: false auditLog: enabled: false agentInjector: enabled: false metrics: enabled: false unsealConfig: enabled: false ui: enabled: true

Keycloak container Image: hub.docker.com/folioci/folio-keycloak, Helm Chart: github.com/bitnami/charts/kong, Git Repository github.com/folio-org/folio-keycloak

Kong Gateway container Image: hub.docker.com/folioci/folio-kong, Helm Chart: charts/bitnami/keycloak, Git Repository github.com/folio-org/folio-kong

MinIO container Image: hub.docker.com/bitnami/minio Helm Chart: github.com/bitnami/charts/minio

 

Also we need to have Module Descriptors Registry to be in place.

Module Descriptors Registry service (MDR) represents HTTP Server that configured in Kubernetes Pod.

Also this Service can be hosted as a static website using Amazon S3.

This HTTP Server holds and distributes Modules Descriptors for Eureka Instance install and updade.

Module descriptor (see Module Descriptor Template) is generated during Continues Integration Flow and is put to Modules Descriptor Registry on finish.

These modules descriptors are used by Eureka install and update flows.

 

Deploying EUREKA on Kubernetes

Once all Prerequisites are met we can proceed with mgr-* Eureka modules deployment to cluster namespace:

Deploy mgr-* applications to Kubernetes Cluster:

 

Eureka deployment flow:

Get Master Auth token from Keycloak.

To run administrative REST API requests against Eureka Instance we need to get Master Access Token from Keycloak on the start.

We need to know request parameters first (consider adopting following example)

  • Keycloak FQDN: keycloak.example.org

  • Token Service Endpoint: /realms/master/protocol/openid-connect/token

  • Client ID: folio-backend-admin-client (this is expected value and should not be changed)

  • Client Secret: SecretPhrase

  • Grant Type: client_credentials (Constant)

We need to save returned Master Access Token to run within any administrative REST API call later.

 

Register Applications Descriptors:

REST API Docs for "POST /applications" endpoint.

We need to register application descriptor in Eureka instance. Application descriptor is created from github.com/folio-org/app-platform-full/sprint-quesnelia/app-platform-full.template.json file taken from release branch.

Docs for registerApplication Rest API call - register a new application.

Descriptor is registered with CURL command and related parameters:

  • Kong Gateway FQDN (http header): kong.example.org

  • Auth token (http header): 'Authorization: Bearer...'

  • Application Descriptor (http request body): JSON data file

 

Register Modules

REST API Docs for “GET /modules/discovery“ endpoint.

Once required Applications Descriptors are registered in instance we proceed with Module Discovery Flow to register modules in system.

Docs for searchModuleDiscovery Rest API call - Retrieving module discovery information by CQL query and pagination parameters.

Modules Discovery is started with CURL command and related parameters:

  • Kong Gateway FQDN (HTTP header): kong.example.org

  • Auth token (http header): 'Authorization: Bearer...'

  • Module Discovery Info (http request body): JSON data file

 

Deploy Backend Modules

Now we are ready to deploy backend modules to Kubernetes Namespace with Eureka instance.

Helm Charts for modules are taken from Github repository folio-org/folio-helm-v2

Variable values for helm charts are stored in dedicated repository folder folio-org/pipelines-shared-library/resources/helm

For exmaple:

Create tenant

REST API Docs for “POST /tenants“ endpoint.

At this point we are ready to create application tenant in Eureka instance.

First we need to take a look on docs for createTenant Rest API call to create a new tenant.

Once we sure about required parameters we give Post HTTP request to create a new tenant.

In our example we create tenant with name “diku” and description “magic happens here”:

 

Set entitlement

REST API Docs for “POST /entitlements“ endpoint.

We have application tenant created so we can entitle registered applications to our tenant.

In other words we enable application(s) to tenant.

As usual we take a look on docs for create Rest API call to Install/enable application for tenant.

From mentioned docs we can get some info about passed parameters and returned value.

Following example shows how to enable application for our tenant without problem:

 

Add User

REST API Docs for “POST /users-keycloak/users“ endpoint.

On this stage we are ready to add first User to Eureka Instance to have administrative privileges later.

So checking parameters in docs for createUser Rest API call to create a new user.

Then use CURL command to run POST HTTP request against Eureka Instance:

Set User Password

REST API Docs for “POST /authn/credentials“ endpoint.

Having our user created we are free to assign him some secret password to use it on login.

Just carefully looking through docs for createCredentials Rest API call to add a new login to the system.

 

Create Role

REST API Docs for “POST /roles“ endpoint.

We need to create a Role to bundle Eureka administrative capabilities with our Admin User.

So accordingly to docs for createRole Rest API call to create a new role we need to run following POST HTTP reuest:

 

Assign Capabilities to Role

REST API Docs for “POST /roles/capabilities“ endpoint.

And then we just attach required Eureka application capabilities to our Admin Role

Using docs for createRoleCapabilities Rest API call we create a new record associating one or more capabilities with the already created role

To get a list of existing Capabilities we are going to use findCapabilities Rest API call

 

Add Roles to User

REST API Docs for “POST /roles/users“ endpoint.

The last step in the row is assigning Admin Role to Admin User to provide him Super Power to rule Eureka world.

So accordingly to existing docs for assignRolesToUser Rest API call to create a record associating role with user we should run CURL command like the next one:

 

Deploy Edge modules

  • Render Ephemeral Properties

At this step we populate Ephemeral Properties template file for every edge-* module found in github.com/folio-org/platform-complete/snapshot/install.json file.

As example for rendering we have properties file to bundle module in tenant and its admin credentials with respective capabilities.

  • Create config map for every edge-* module

Completed Ephemeral Properties files have to be stored in Cluster Namespace as configmaps:

  • Deploy edge-* modules to cluster namespace

At this point we deploy a set of edge-* modules (see install.json file) to cluster namespace:

Perform Consortia Deployment (if required)

REST API Docs for “POST /consortia“ endpoint.

  • Set up a Consortia Deployment with the given tenants

    • Create consortia deployment instance accordingly to docs for consortia REST API call to save consortium configuration.

    • Add Consortia Central Tenant

    • Add Consotia Institutial Tenant

Perform indexing on Eureka resources

There is comprehensive documentation piece for Search Indexing we would highly recommend to walk through to learn that magic closer.

Where reindex_job_id - ID returned by /search/index/inventory/reindex endpoint in previous step.

Configure Edge modules

  • Create Eureka Users for Eureka UI

    • UI modules expect respcective Users created in Eureka instance. Enough system capabilites have to be assigned to UI Users to perfrom desired level of access.

    • To have some clue how UI modules are mapped with Eureka Accounts with required capabilities please take a look into folio-org/pipelines-shared-library/resources/edge/config_eureka.yaml file

    • So we need to create extra Eureka Accounts to be used by UI Modules. For example

      • Create User Account:

      • Set Password for User:

      • Assign Capabilities to User Account:

      • Assign Capabilities Set to User Account

Build FOLIO Eureka UI

Deploy Eureka UI

 

 

 

Kong fine-tuning

You can customize Kong's default behavior using environment variables. When the application starts, it uses environment variables to configure the personal Nginx web server and Kong itself. To set Nginx parameters, use environment variables with the prefix KONG_NGINX_. For Kong-specific configurations, define variables with the prefix KONG_.

Post-Deployment Tasks

Monitoring and logging

Scaling and updates

Troubleshooting and Common Issues

  • InternalServerErrorException error 500. Connection refused - lack of resources.

Preamble: Working with complex operations such as application tenant entitlement may pose challenges due to the need for all modules to be available, direct requests between platform modules, the loosely coupled nature of K8S, and the resulting temporary unavailability of some modules.
Issue: Various errors like the following may occur during these complex operations due to incomplete execution within the required timeframe or unavailability of modules:
Enabling application for tenant failed: [errors:[[message:Flow 'd62cbd2c-9261-47df-bffd-e6a13871c59f' finished with status: FAILED, type:FlowExecutionException, code:service_error, parameters:[[key:mod-<some_module_name>-folioModuleInstaller, value:FAILED: [IntegrationException] Failed to perform doPostTenant call, parameters: [{key: cause, value: 500: {"errors":[{"type":"InternalServerErrorException","code":"service_error","message":"Failed to proxy request","parameters":[{"key":"cause","value":"Connection refused: localhost/127.0.0.1:8081"}]}],"total_records":1}}]]
Cause: The issue could be caused by resource throttling or module unavailability. If the allocated CPU or RAM limit is reached, the time needed to perform these operations significantly increases and exceeds the expected time limit. In other cases, in a self-rebalancing K8S cluster, pods for some modules may be evicted and moved to other nodes, leading to the inaccessibility of core modules or modules that play significant roles in these complex operations processes. If a core module like Kong, Keycloak, mgr-* modules, or very important modules like mod-roles-keycloak and mod-users-keycloak are affected, it could lead to breaking the invocation chain.
Resolution: To address this issue, you could use one of the following approaches or in conjunction:
- Provide node size fit the total amount module request regarding the CPU and RAM.
- Provide resource limits to each module to ensure they would not move by the cluster during the heavyweight operations.

  • InternalServerErrorException error 500. Connection refused - deployment timing.

Issue: During the entitlement process the following error message could appear
Enabling application for tenant failed: [errors:[[message:Flow 'd62cbd2c-9261-47df-bffd-e6a13871c59f' finished with status: FAILED, type:FlowExecutionException, code:service_error, parameters:[[key:mod-<some_module_name>-folioModuleInstaller, value:FAILED: [IntegrationException] Failed to perform doPostTenant call, parameters: [{key: cause, value: 500: {"errors":[{"type":"InternalServerErrorException","code":"service_error","message":"Failed to proxy request","parameters":[{"key":"cause","value":"Connection refused: localhost/127.0.0.1:8081"}]}],"total_records":1}}]]
This error may appear even if there are enough resources available for the environment, as described in the InternalServerErrorException error 500. Connection refused - lack of resources. topic and module availability was ensured.
Cause: In some cases, this error could occur due to inappropriate deployment timing, especially when an automated deployment process is used. Even if enough resources have been provided, modules need time to become available after they start. Some heavyweight modules, such as mod-oa or mod-agreement, may need up to 5 minutes to start. Therefore, it's important to check module availability after deployment before starting any operations, such as instance entitlement on modules. Additionally, ensure the correct order of deployment: Kong, Keycloak - Mgr-components - Modules.

  • The application is not entitled on tenant - sidecar vs Kafka
    Issue: The error The module is not entitled on tenant ... may occur during certain operations, especially during the entitlement process. You can find the full log of this issue in the related module's sidecar.
    Cause: This error happens due to communication issues between mgr-tenant-entitlement and the corresponding module, which notifies the module about the end of the entitlement process via Kafka. In some cases, the sidecar consumer connection could be marked as dead. The main reason for that is a wrong setting of Kafka heartbeat request and sidecar poll requests, which aren’t aligned with each other, or various networking issues that lead to the poll request being absent during the specific period. The module is not entitled on tenant ... errors appear due to the next module portion in the entitlement process request to the previously entitled modules and sidecar which does not receive a message via Kafka about the finish of the related module entitlement process.
    Resolution: In case a sidecar loses connection with Kafka during the entitlement process and it has not been affected by this issue, simply restart the affected module, and its sidecar will get entitlement information from the mgr-tenant-entitlement module. In case the entitlement process fails, repeat it. Nevertheless ensures a stable connection between Kafka and sidecars as well as aligns Kafka heartbeat request and sidecar poll request periods.

  • The application is not entitled on tenant - sidecar vs mod-tenant-entitlement
    Issue: The error The module is not entitled on tenant ... may occur during certain operations.
    Cause: This error can happen when the mgr-tenant-entitlement and some module pods are redeployed simultaneously, and the module's sidecar becomes ready before mgr-tenant-entitlement, causing it to be unable to obtain information about the application entitlement from the MTE module and potentially providing an error response to other modules upon request.
    Resolution: To fix this issue, ensure the correct module redeployment order. If the issue occurs unexpectedly, simply restart the affected module.

  • The upstream server is timing out - Kong fine-tuning
    Issue: Some API requests may result in a 504 error code with the error message "upstream server is timing out.” This problem is primarily caused by Kong and typically occurs during long operations, such as assigning a capability to a role or user.
    Cause: When a request reaches a specific module, it always goes through Kong, which has two potential points of failure: Kong's Nginx and Kong itself.
    Resolution: To address this issue, you should adjust the upstream timeout of Kong's Nginx using the KONG_NGINX_HTTP_KEEPALIVE_TIMEOUT, KONG_NGINX_UPSTREAM_KEEPALIVE, and KONG_NGINX_HTTP_KEEPALIVE_REQUESTS environment variables. Additionally, consider modifying the following Kong variables: KONG_UPSTREAM_KEEPALIVE_IDLE_TIMEOUT, KONG_UPSTREAM_KEEPALIVE_POOL_SIZE, and KONG_UPSTREAM_KEEPALIVE_MAX_REQUESTS, KONG_UPSTREAM_CONNECT_TIMEOUT, KONG_RETRIES. More information about how to perform that.

  • Some capabilities/capability sets are absent - Kafka messages processing period
    Issue: If you try to assign capabilities or capability sets to a role or user immediately after the entitlement process, you may encounter an issue with their absences.
    Cause: The predefined capabilities (also known as permissions) and capability sets are created just after the application entitlement process. This process takes time to complete. Here's how it works:

    • The mgr-tenant-entitlement module sends messages to the mod-roles-keycloak via Kafka with a list of roles. This process lasts during the entitlement process as each module is enabled on a tenant.

    • The mod-roles-keycloak starts processing the messages right after it has been entitled, so it could be at the end of the entitlement process.

    • mod-roles-keycloak proceeds through the message queue in Kafka until it reaches the end.

Resolution: Before starting to assign capabilities or capability sets, it is important to check the Kafka module consumer message queue offset to ensure that the process has been completed. Alternatively, you should determine the appropriate amount of time to allow for the process to finish based on the performance of your environment.