RANCHER-415 FOLIO cluster using Graviton EC2 instances
Purpose/Overview:
In scope of the ticket we need to investigate if we can provide Rancher clusters based on AWS EC2 instances with arm64 Graviton CPU.
At the moment community CI process doesn't configured to build arm64 docker images, so we need to do it for all FOLIO modules.
Steps that we need to do before implement supporting arm64 platform:
1. Since x86 and arm64 platforms completely different CPU platforms this means that current build flow of modules need to be reworked:
1.1. First of all we need to add additional Jenkins slave with Graviton arm64 CPU and configure it to allow build arm64 based docker images.
1.2. Right now community CI process uses eclipse-temurin:11-jre-alpine docker image as base for every java11 module with small customization. There is no arm64 version for the image so we can't use it for multi platform builds.
https://hub.docker.com/_/eclipse-temurin/tags?page=1&name=11-jre-alpine
https://github.com/folio-org/folio-tools/blob/master/folio-java-docker/openjdk11/Dockerfile
https://github.com/folio-org/okapi/blob/master/okapi-core/Dockerfile
At the moment there is no official stable arm64/amd64-ALPINE-java11 docker image, and we think there is several ways to resolve this problem:
1) Build from scratch needed arm64 Alpine docker image with all needed tools - java11 and etc.
https://github.com/adoptium/containers/blob/main/11/jre/alpine/Dockerfile.releases.full
2) Migrate base image to Debian based base image: eclipse-temurin:11-jre based on Ubuntu 22.04 OR openjdk:11-jre OR openjdk:11 based on Debian 11 - support both linux/amd64 and linux/arm64/v8
https://hub.docker.com/r/arm64v8/eclipse-temurin/tags?page=1&name=11-jre
https://hub.docker.com/_/openjdk/tags?page=1&name=11-jre
2. At the moment Bitnami doesn't support ARM64 architecture for their container images (and there is no specific ETA for this topic).
https://github.com/bitnami/charts/issues/7305
Because of this we need to rework deployment of self such managed services that we use in folio-dev, folio-testing environments:
1) Kafka and Zookeeper - Strimzi operator - https://github.com/strimzi/strimzi-kafka-operator (supports arm64)
2) PostgreSQL - looks like we need to customize official image with arm64 support to use it with Bitnami chart. (Stolon operator - Need investigation)
https://hub.docker.com/_/postgres/tags?page=1&name=12.8
https://artifacthub.io/packages/helm/stolon/stolon
3) Minio - https://artifacthub.io/packages/helm/minio-official/minio
4) Logging solution (EFK, ELK etc.) per environment - https://artifacthub.io/packages/olm/community-operators/elastic-cloud-eck
5) ExternalDNS addon - set officially supported by k8s arm64 and compatible with Bitnami chart image value to - registry.k8s.io/external-dns/external-dns/v0.12.2
For folio-perf environments, we could use AWS managed services.
Kubecost, OpenSearch, PgAdmin - these tools works ok with arm64, so we don't need to update deployment flow.
Interesting comparison of AMD/Intel/Graviton CPU performance:
https://www.percona.com/blog/comparing-graviton-arm-performance-to-intel-and-amd-for-mysql-part-3/
https://www.servethehome.com/aws-ec2-m6-instance-intel-ice-lake-and-graviton-2-acceleration-matters/
Performance testing by Martin Tran:
https://docs.google.com/document/d/1xr8BMW9NZi9Ljs6FZIn_NK2_vQiGrhL87vTCRptYd-w/edit
Graviton EC2 instances is cheaper that Intel/AMD ~20%.
Сonclusions
At the moment we couldn't provide Rancher environment based on Graviton arm64 EC2 instances since community CI doesn't support multi platform builds.
When needed changes to CI process will be applied, we could provide environments with AWS managed services, and then if we would like to use it with our folio-dev, folio-testing environments we need rework deployment flow to avoid big expenses.