RANCHER-582 Investigate memory optimised instances usage for Rancher clusters

Difference between instance types

Table with compare technical characteristics EC2 instances that can be applied to our infrastructure

Result of investigation shown in table below.  I tested fourth instance types on our infrastructure for compare how many nodes we will create on test application on different utilization.

Currently in use
m5ad.xlarge, m5.xlarge, m5a.xlarge, m5d.xlarge
Class instanceType instance CPUMemoryNetwork Bandwidth (Gigabit)StorageProcessorprice for 1 hour for 1 instance
(On Demand)
scale-down-utilization-threshold=0,5
count of nodes
scale-down-utilization-threshold=0,75
count of nodes
m5m5ad.xlarge416101 * 150 NVMe SSDAMD EPYC 70000,206

m5.xlarge41610EBS onlyIntel Xeon® Platinum 8175M0,19255
m5a.xlarge41610EBS onlyAMD EPYC 70000,172

m5d.xlarge416101 * 150 NVMe SSDIntel Xeon® Platinum 8175M0,226

r5r5.large21610EBS onlyIntel Xeon® Platinum 8000 (Skylake 8175M или Cascade Lake 8259CL)0,12676
r5.xlarge43210EBS onlyIntel Xeon® Platinum 8000 (Skylake 8175M или Cascade Lake 8259CL)0,252

r5a.xlarge43210EBS onlyAMD EPYC 7000 (AMD EPYC 7571)0,22633
r5ad.xlarge432101 * 150 NVMe SSDAMD EPYC 7000 (AMD EPYC 7571)0,262

r5d.xlarge432101 * 150 NVMe SSDIntel Xeon® Platinum 8000 (Skylake 8175M или Cascade Lake 8259CL)0,288

r6r6a.large21612,5EBS onlyAMD EPYC (AMD EPYC 7R13)0,113

r6a.xlarge43212,5EBS onlyAMD EPYC (AMD EPYC 7R13)0,226

r6i.xlarge43212,5EBS only Intel Xeon Scalable (Ice Lake 8375C)0,25233
r6i.large21612,5EBS only Intel Xeon Scalable (Ice Lake 8375C)0,126

r6in.large21625EBS only Intel Xeon Scalable (Ice Lake 8375C)0,174

r6in.xlarge43230EBS only Intel Xeon Scalable (Ice Lake 8375C)0,348

Features of different instances class

Information was taken from official aws documentation https://aws.amazon.com/ec2/instance-types/?nc1=h_ls.

m5

  • Up to 3.1 GHz Intel Xeon Scalable processor (Skylake 8175M or Cascade Lake 8259CL) with new Intel Advanced Vector Extension (AVX-512) instruction set
  • Up to 25 Gbps network bandwidth using Enhanced Networking
  • Instance storage offered via EBS or NVMe SSDs that are physically attached to the host server

m5a

  • AMD EPYC 7000 series processors (AMD EPYC 7571) with an all core turbo clock speed of 2.5 GHz
  • Up to 20 Gbps network bandwidth using Enhanced Networking
  • Instance storage offered via EBS or NVMe SSDs that are physically attached to the host server

r5

  • Up to 3.1 GHz Intel Xeon® Platinum 8000 series processors (Skylake 8175M or Cascade Lake 8259CL) with new Intel Advanced Vector Extension (AVX-512) instruction set
  • With R5d instances, local NVMe-based SSDs are physically connected to the host server and provide block-level storage that is coupled to the lifetime of the R5 instance

r5a

  • AMD EPYC 7000 series processors (AMD EPYC 7571) with an all core turbo clock speed of 2.5 GHz
  • Up to 20 Gbps network bandwidth using Enhanced Networking
  • Instance storage offered via EBS or NVMe SSDs that are physically attached to the host server
  • With R5ad instances, local NVMe-based SSDs are physically connected to the host server and provide block-level storage that is coupled to the lifetime of the R5a instance

r6a

  • Up to 3.6 GHz 3rd generation AMD EPYC processors (AMD EPYC 7R13)
  • Up to 35% better compute price performance over R5a instances
  • Up to 50 Gbps of networking speed

r6i

  • Up to 3.5 GHz 3rd generation Intel Xeon Scalable processors (Ice Lake 8375C)
  • Up to 15% better compute price performance over R5 instances
  • Up to 20% higher memory bandwidth per vCPU compared to R5 instances
  • Up to 50 Gbps of networking speed
  • With R6id instances, up to 7.6 TB of local NVMe-based SSDs are physically connected to the host server and provide block-level storage that is coupled to the lifetime of the R6i instance

r6in

  • Up to 3.5 GHz 3rd Generation Intel Xeon Scalable processors (Ice Lake 8375C)
  • Up to 20% higher memory bandwidth per vCPU compared to R5n and R5dn instances
  • Up to 200 Gbps of networking speed, which is up to 2x compared to R5n and R5dn instances
  • Up to 80 Gbps of EBS bandwidth, which is up to 1.3x more than R5b instances
  • With R6idn instances, up to 7.6 TB of local NVMe-based SSDs are physically connected to the host server and provide block-level storage that is coupled to the R6idn instance lifetime

Compare total price on tested instance types

Total price for 1 hour

In table below we can see the total price for run one test application on tmp cluster for each instance type, so total price for m5.xlarge is the base price that we are paying now.


Currently in use
m5.xlarge
Type instance price for 1 hour for 1 instance
(On Demand)
Total count nodesTotal price
m5.xlarge0,19250.96
r5.large0,12670.882
r5a.xlarge0,22630.678
r6i.xlarge0,25230.756

Total discount

In table below we can see total discount if we will run the test application on different instance type. So, as you can see for r5a.xlarge we will pay less on about 30% because price for AMD less on 10% than on Intel and we are creating less nodes then for m5.xlarge , so this gives to us opportunity to save money.

Type instance Discount if compare with m5.xlarge
r5.large~10%
r5a.xlarge~30%
r6i.xlarge~20%

Compare utilization on Rancher

Files with screenshots of utilization CPU/RAM that we can see in rancher.

Calculate total cost for 3 years for different EC2 plans

Calculate total cost for 3 years for 1 instance

Table below show the total cost that we will spend for 3 years using different EC2 payment options for 1 instance.



Standard RISave plan
Type instance On Demandno upfrontPartial upfrontall upfrontno upfrontPartial upfrontall upfront
m5.xlarge4976,6421812007,6418972549,162365,22312,64
r5.large3265,921419131912451734,481603,261576,8
r5a.xlarge5857,9225752370,622333101,042864,442811,96
r6i.xlarge6531,8430042781,6826153454,563198,483134,68

Calculate total cost for 3 years for amount instances

Count was taken from investigation when we had to run our test cluster on different instance types. 

Type instance count nodes
m5.xlarge5
r5.large7
r5a.xlarge3
r6i.xlarge3

Table below show the total cost that we will spend for 3 years using different EC2 payment options for run test application on folio-tmp cluster.



Standard RISave plan
Type instance On Demandno upfrontPartial upfrontall upfrontno upfrontPartial upfrontall upfront
m5.xlarge24883,21090610038,2948512745,81182611563,2
r5.large22861,4499349233871512141,3611222,8211037,6
r5a.xlarge17573,7677267111,866999303,128593,328435,88
r6i.xlarge19595,5290138345,04784510363,689595,449404,04


Conclusion

As the result of investigation, for save money need to move to memory optimized instance class r5 or r6 because this class has much more memory and it gives opportunity to run less EC2 instances than we are running now.  

Another recommendation is to use AMD processors because they cheaper then Intel on about 10%. If we are talking about count of cores in CPU, I think we should use at minimum 4 cores, although 2 cores cheaper then 4 we will have to use more EC2 instances for manage load on our application and it will be a little slowing then on 4 cores. Moreover, I found out that increasing of utilization do not give any visible advantages.




Main files where keep result of investigation: