Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Problem statement

There should be platform wide solution for logging aggregation in order to quickly identify and find the root causes of issues

...

In order to include request information RMB based modules will be able to leverage FolioLoggingContext after implementation of

Jira Legacy
serverSystem JiraJIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyRMB-709
 (which is blocked by Vert.x 4 release and migration). For now this is implemented only in OKAPI.

...

When user request routed to multiple, different microservices and something goes wrong and a request fails, this request-id that will be included with every log message and allow request tracking. 

Usage of Log4j 2 Layouts

An Appender in Log4j 2 uses a Layout to format a LogEvent into a form that meets the needs of whatever will be consuming the log event. The library provides a complete set of possible Layout implementations. It is recommended to use either Pattern or JSON layouts in FOLIO backend modules. By default, the Pattern layout should be used for development, testing, and production environments. In the cases when log aggregators are used in particular environments, the usage of JSON layout should be enabled explicitly for all backend modules by the support/deployment teams.

All FOLIO backend modules must provide configuration property files for both layouts (Pattern and JSON). In addition, recommended default logging configurations are provided by the respective libraries (folio-spring-base, RMB).

...

For any logging aggregation solution json format of logs is preferable over plain text in terms of performance and simplicity of parsing. All Folio modules should have the same json format for logs, e.g. the following:

httpsFor Spring Way modules https://github.com/folio-org/folio-spring-base/blob/master/src/main/resources/log4j2-json.properties

For RMB modules https://github.com/folio-org/raml-module-builder/blob/master/domain-models-runtime/src/main/resources/log4j2-json.properties

Default configuration for Pattern Layout

httpsFor Spring Way modules https://github.com/folio-org/folio-spring-base/blob/master/src/main/resources/log4j2.properties

For RMB modules https://github.com/folio-org/raml-module-builder/blob/master/domain-models-runtime/src/main/resources/log4j2.properties

Logging aggregation stack

...

Expand

EFK

Cons:

  • There tools with richer functionality (e.g. datadog)

Alerting

There are many plugins available for watching and alerting on Elasticsearch index in Kibana e.g. X-Pack, SentiNL, ElastAlert. Alerting can be easily implemented in Kibana (see: https://www.elastic.co/blog/creating-a-threshold-alert-in-elasticsearch-is-simpler-than-ever)

Elastalert is open source simple and popular open source tool for alerting on anomalies, spikes, or other patterns of interest found in data stored in Elasticsearch. Elastalert works with all versions of Elasticsearch.

Deployment options

K8s deployment

Using a node level logging agent


Separate kube-logging namespace should be created into which EFK stack components should be installed. This Namespace will also allow one to quickly clean up and remove the logging stack without any loss of function to the Kubernetes cluster. For cluster high-availability 3 Elasticsearch

Pods should be deployed to avoid the “split-brain” issue (see A new era for cluster coordination in Elasticsearch and Voting configurations).

K8s deployment: Kibana

To launch Kibana on Kubernetes Service called kibana should be created in the kube-logging namespace. Deployment consists of one Pod replica. Latest kibana docker image located at: docker.elastic.co/kibana/. Range of 0.1 vCPU - 1 vCPU should be guaranteed to the Pod. 

K8s deployment: Fluentd 

Fluentd should be deployed as a DaemonSet, which is a Kubernetes workload type that runs a copy of a given Pod on each  node in the Kubernetes cluster (see: https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent).

Folio modules should use single common slf4j configuration, for writing JSON logs on the nodes. The Fluentd Pod will tail these logs, filter log events, transform the log data, and ship it off to the Elasticsearch.  Fluentd DaemonSet spec provided by the Fluentd maintainers should be used along with docs provided by the Fluentd maintainers: Kuberentes Fluentd.

Service Account called fluentd that the Fluentd pods will use to access the Kubernetes API should be created in the kube-logging namespace with label app: fluentd (see: Configure Service Accounts for Pods in the official Kubernetes docs). ClusterRole with getlist, and watch permissions on the pods and namespaces objects should be created.

NoSchedule toleration should be defined to match the equivalent taint on Kubernetes master nodes. This will ensure that the DaemonSet also gets rolled out to the Kubernetes masters (see: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/). 

https://hub.docker.com/r/fluent/fluentd-kubernetes-daemonset/ provided by the Fluentd maintainers should be used. This Dockerfile and contents of this image are available in Fluentd’s fluentd-kubernetes-daemonset Github repo.

The following environment variables should be configured for Fluentd:

  • FLUENT_ELASTICSEARCH_HOST: Elasticsearch headless Service address defined earlier: elasticsearch.kube-logging.svc.cluster.local. This will resolve to a list of IP addresses for the 3 Elasticsearch Pods. The actual Elasticsearch host will most likely be the first IP address returned in this list. To distribute logs across the cluster, you will need to modify the configuration for Fluentd’s Elasticsearch Output plugin (see: Elasticsearch Output Plugin).
  • FLUENT_ELASTICSEARCH_PORT: 9200.
  • FLUENT_ELASTICSEARCH_SCHEME: http.
  • FLUENTD_SYSTEMD_CONF: disable.

...