RANCHER-286 Log dashboards access using Github OAuth


Purpose/Overview:

In scope of the ticket we need to investigate if there is possibility to provide dev/test teams access to Kibana (or any alternative) data visualization tool using GitHub OAuth/OIDC/SAML.

Possible solutions:

Paid tools:

At the moment there is a lot of PAID log management solutions such as: Splunk, Datadog Log Collection & Management, SolarWinds Security Event Manager, SolarWinds Papertrail, Zebrium, Sematext Logs, Loggly and etc.

A part of them provide some free trial period/limit, but these limits doesn't suit our needs. Another part of solutions doesn't provide possibility to provide access to dashboards using GitHub auth.

The most attractive are:

Zebrium

The most useful and attractive feature of Zebrium is Artificial Intelligence (AI) that uses to find issues as well to uncover root cause automatically, while all other tools rely on users adding rules manually.


Pros:
Easy to start; just copy/paste customized helm or kubectl command.

Automatic detection of problems and root cause without needing manual rules.

Can be used as a standalone log management tool or as an ML Add-on to your existing log management tool such as the ELK Stack.

Cons:
It's PAID.

The free plan is limited to 500 MB a day with 3-day retention.

Sematext

Sematext is great and powerful tool and it's not limited to K8s logs, but also does monitoring and alerting for K8s (on metrics and logs). Logs that are collected are parsed/structured automatically for several different known log formats and users can also provide patterns for custom logs. It also exposes the Elasticsearch API, so you can use any tool that works with Elasticsearch such as Filebeat and Logstash with Sematex too. You can use it as a variant of ELK or with the native Sematext ecosystem. The tool helps to create specific rules to monitor specific cases and catch anomalies. Clients can control and monitor all services, thanks to Sematex’s comprehensive real-time dashboard.


Pros:

Configurable overage controls cost by stopping logs from being accepted.

The flexibility of ELK.

Cons:

Sematext widgets and Kibana cannot be mixed on one dashboard.

Custom parsing needs to be done in the log shipper, Sematext parses only Syslog and JSON on the server-side. 

Weak tracing functionality although they plan to improve it.

Free tools:

Grafana Loki

Loki is a multi-tenant and highly-available log aggregation tool inspired by Prometheus. This tool helps to collect logs, but users will need to build manual rules for it. Loki works with Grafana, Prometheus, and Kubernetes. 
Loki achieves a lot of efficiency because it does not index the contents of your logs but instead only indexes a set of labels for each event stream.z


Pros:
Large ecosystem.

Rich visualization capabilities.

Efficiency due to not indexing log content 

Cons:
Not optimized for Kubernetes log management.

Lots of manual work for building rules.

Lack of content index potentially limits search performance.

EFK/ELK Stack

ELK is maybe the most well known open-source tool for log management in general. ELK is an acronym for Elasticsearch, Logstash, and Kibana; each component takes care of different parts of the logging process. Elasticsearch is a powerful and scalable searching system, Logstash aggregates and processes logs, and Kibana provides an analysis and visualization interface that helps users make sense of data. Together they provide a comprehensive logging solution for K8s. Note there are many other variants of the ELK stack (like EFK Stack - Elasticsearch, Fluentd, and Kibana).


Pros:
The tool is well-known and has a huge community.

Very broad platform support.

Rich analysis and visualization capabilities in Kibana.

Requires complex parsing for logs and manually defined alert rules.

Cons:
Difficult to maintain at scale.

Lots of tuning, particularly for large environments.

Heavy resource requirements.

Some features require a paid license.

Сonclusions

The most good and convenient choice at the moment is EFK/ELK stack, cause it's the most popular tool and a lot of dev/test teams likely have usage experience with it as well as DevOps engineers has expertise in installing/configuring.

The biggest CONS of EFK/ELK stack is that auth using SAML, OpenID Connect, Kerberos, JWT features are a part of paid Platinum subscription.

At the moment we suggest to use EFK/ELK stack with RBAC model: create Admin user for administration purposes, and Dev/Test user with RO rights for access to log dashboards per every cluster.