Reporting: Analytics and Audit Data Logging for External Reporting (UXPROD-330)

[UXPROD-340] SPIKE: Prototype Kafka Message Queue Created: 05/Mar/18  Updated: 19/Jan/19  Resolved: 15/Oct/18

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: None
Parent: Reporting: Analytics and Audit Data Logging for External Reporting

Type: Story Priority: P3
Reporter: VBar Assignee: Matt Reno
Resolution: Done Votes: 0
Labels: analytics, kafka
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to UXPROD-332 Message Queue for Data Extraction Closed
relates to UXPROD-1219 Implement Interim post- filter for da... Closed
Epic Link: Reporting: Analytics and Audit Data Logging for External Reporting
Front End Estimator: VBar
Back End Estimate: Medium < 5 days
Back End Estimator: VBar
Development Team: EBSCO - FSE

 Description   

Stand up an Apache Kafka instance for the purposes of receiving transaction data from Okapi.

  • Clustered mode
  • Zookeeper
  • Access Control.

Provides

  • a containerized (Docker) deliverable.
  • reference documentation
  • Pre-analysis of transaction data for optimization purposes


 Comments   
Comment by Hongwei Ji [ 05/Oct/18 ]

BTW, during earlier data capture POC, we used following to set up Kafaka
https://hub.docker.com/r/wurstmeister/kafka/
https://github.com/wurstmeister/kafka-docker

Comment by Matt Reno [ 15/Oct/18 ]

For this spike we installed a kafka cluster via a docker image on 3 EC2 instances. Additionally, a zookeeper ensemble was installed on these 3 instances.

EC2 instances: t2.micro
• Installed docker/docker-compose (instructions on the interwebs)
• Installed git (instructions on the interwebs)

Kafka: https://github.com/wurstmeister/kafka-docker
• git clone https://github.com/wurstmeister/kafka-docker.git

Edited docker-compose.yml file:

[ec2-user@ip-10-23-33-103 kafka-docker]$ cat docker-compose.yml

version: '2'
services:
  zookeeper:
    image: 31z4/zookeeper
    restart: always
    hostname: zoo1
    ports:
      - 2181:2181
      - 3888:3888
      - 2888:2888
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=10.23.33.117:2888:3888 server.3=10.23.33.112:2888:3888
  kafka:
    build: .
    ports:
      - 9092:9092
    environment:
       KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_HEAP_OPTS: "-Xmx256M -Xms128M"
      KAFKA_ADVERTISED_HOST_NAME: 10.23.33.103
      KAFKA_ZOOKEEPER_CONNECT: 10.23.33.103:2181,10.23.33.117:2181,10.23.33.112:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Other servers have different "ZOO_MY_ID" (incremented) values. The "KAFKA_HEAP_OPTS" setting is to allow for the small EC2 instance used to host the kafka docker image. By default it was trying to use 1GB, which caused an out of memory exception on startup. Also increased the replication factor to 3 to provide some redundancy.

mod-aes
• Launched via AesVerticle with the following args:
○ -Dhttp.port=8081 -Dkafka.url=10.23.33.103:9092,10.23.33.117:9092,10.23.33.112:9092 -Dvertx.logger-delegate-factory-class-name=io.vertx.core.logging.SLF4JLogDelegateFactory

Curl was used to make a request to mod-aes:
• $ curl -X POST http://localhost:8081/test -H 'Content-Type: application/json' -H 'X-Okapi-Tenant: test' -H 'x-okapi-filter: pre' -d '

{"test": "some value"}

'

Next steps:
• Securing kafka/zookeeper (to be another spike for a later sprint)
○ SSL connections
○ Kafka client authentication
○ Zookeeper client authentication
○ Kafka data-at-rest encryption
• Cluster size optimization
○ Zookeeper and Kafka
• Topic configuration (per tenant?)
○ Replication/partitioning/etc.

Generated at Fri Feb 09 00:07:27 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.