Kafka Security

Table of contents

Problem

Folio project using Kafka intensively to transfer data between modules.

This data is often treated as sensitive data and must be protected both in transit and at rest.

Data of a single tenant must be restricted to a tenant-specific credentials, so tenant data intersection will not be possible at Kafka level.

Some topics in Kafka might need ' global' access - means accessible by all/group of tenants. Thou they must be protected by the solution.

Document outline

This solution addresses following issues

  1. Authentication protocol
  2. ACL control
  3. Consumer configuration
  4. Kafka configuration

And does not include

  1. Key management (including rotation)
  2. Deployment / launch pipeline
  3. Key storage

Since those points are environment bound

Requirements

  1. The solution must follow 'least privileged" principle.
  2. Each module using Kafka directly must authorize in Kafka cluster using provided credentials.
  3. Each topic in Kafka must be restricted to either particular tenant or treated as 'global'.
  4. The solution must support majority of runtime environments (AWS, k8s at least)
  5. Topic creation should not require manual operations for setting permissions/granting access.
  6. Kafka inter-broker communication must be secure
  7. Kafka topics must be tenant-bound, and there must be generic approach to configure access for existing an future topics

Solution overview

Consumer

There are several options available for consumer authentication in Kafka.

  1. SASL/PLAIN - open password transfer. No really secure and requires additional transport layer security.
  2. SASL/GSSAPI - requires Kerberos setup. Complex solution not suitable for requirement point (4).
  3. SASL/OAUTHBEARER - requires token service to be set up. Not suitable for (4)
  4. SASL/SCRAM - stores password hashes in Zookeeper. Considered secure, but requires additional actions to a) secure zookeeper access, b) managing access via zookeeper (storing/updating password hashes)
  5. mTLS - SSL-based authentication protocol, implementing both transport level security and authentication via encrypted certificates. Requires private CA to be set up for certificates emission.

The simplest way to implement consumer authentication is considered to me mTLS. It only requires trusted CA to be set up within the system, and this CA could also be used for inter-service TLS communications. Other options require more management or/and less secure (do not provide transport level security for example).

MTLS approach requires following parameters to be configured at consumer level:

  1. The keystore. Contains access certificates for all tenants. In mutli-tenant configuration pairs of <tenantId, certificate alias> (to be covered in implementation phases)
  2. Keystore password (optional)
  3. Since each certificate in keystore can be protected by individual password, these passwords must be supplied along with tenant data (<tenantId, certificate alias, certificate password>). This step is optional.
  4. Truststore containing root CA of an installation (to verify server identity and set up TLS channel)
  5. Truststore password (optional)

Since modules are launched in either types of containers/vms we use 3 configuration options

  1. Keystore/keystore passwords and locations are passed via ENV variables
  2. Keystore is provided as JKS file and mounted inside container image
  3. Truststore in JKS format mounted inside container image

Following environment variables are used by Kafka consumer by default:

Truststore
security.protocol=SSL
ssl.truststore.location=<path>
ssl.truststore.password=<password>

These settings are separate from JAVA truststore keys from javax.net.ssl.* group

Keystore
ssl.keystore.location=<path>
ssl.keystore.password=<password>
ssl.key.password=<key-password>


If default parameter names are used, Kafka implementation module will consume them automatically and configure listeners accordingly.

To implement multi-tenancy authentication see Implementation Phases section.

Broker

Broker configuration is described in Kafka documentation accessible by the links

for plain Kafka installation: https://kafka.apache.org/documentation/#security_configbroker

for AWS MSK cluster: https://docs.aws.amazon.com/msk/latest/developerguide/msk-authentication.html


These documents also cover inter-broker communication for requirement №6

ACL configuration

Kafka has a guideline for implementing multi-tenancy available via link:

https://kafka.apache.org/documentation/#multitenancy


According to documentation Kafka supports prefix-based ACL, which perfectly suits our requirements. This approach also allows rate limiting and disc quotas per tenant.

Client quotas: Kafka supports different types of (per-user principal) client quotas. Because a client's quotas apply irrespective of which topics the client is writing to or reading from, they are a convenient and effective tool to allocate resources in a multi-tenant cluster.


A number of examples to configure prefixed ACL are available on: https://kafka.apache.org/documentation/#security_authz_examples

Example
You can add acls on prefixed resource patterns, e.g. suppose you want to add an acl "Principal User:Jane is allowed to produce to any Topic whose name starts with 'Test-' from any host". You can do that by executing the CLI with following options: 

bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Jane --producer --topic Test- --resource-pattern-type prefixed


QA

  • AWS ECS with MSK cluster and AWS SSM – convenient way to inject keystores
  • Security for non-JAVA clients (https://github.com/folio-org/mod-graphql - does it use Kafka?)
  • Do we need to cover and configure non-prod/testing environments? (ex. dev/staging etc)

Implementation phases

The implementation should be split into 3 phases:

Phase one

Configure broker for mTLS. Enable transport layer and client security, not isolating tenant-based data (thou already using prefixed topic naming)

This phase is easily implemented and provides basic security

Enabling SSL and ACL for Kafka – here's the step-by-step guideline for broker configuration

Jira issues

  1. Data import -  MODPUBSUB-171 - Getting issue details... STATUS
  2. Search: MSEARCH-105 - Getting issue details... STATUS
  3. PUBSUB - MODPUBSUB-182 - Getting issue details... STATUS
  4. Remote storage - MODRS-62 - Getting issue details... STATUS
  5. AWS -
  6. Community - FOLIO-3173 - Getting issue details... STATUS + FOLIO-3174 - Getting issue details... STATUS

Phase two

Implement multi-tenant topic management.

At this phase more code changes are required:

  1. Implement methods to provide credentials per-tenant: tenantId => certificate alias and password
  2. Update modules to create tenant-configured listeners (adding an abstraction over existing implementation)
  3. Update ACLs of brokers

Phase three

Introduce centralized secrets storage and audit.

This step might require modules update to allow configuration from different sources (so abstraction implemented in phase 2 must be a separate library and designed accordingly)