RANCHER-664 Calculate overall costs and forecast migration from m5 to r5 of Rancher EKS


Overview of current EC2 instances for EKS clusters

AWS instances information

Clusters "folio-eks-2-us-west-2-folio-eks-2-ng-us-west2a-Node" and "folio-eks-2-us-west-2-folio-eks-2-ng-us-west2b-Node" are going to be deprecated. So, all investigation we will provide with next clusters: "folio-dev", "folio-perf", "folio-testing".

Current count of instances for different clusters

Instance typecount instancestotal CPUtotal memorycluster
m5.xlarge1040160folio-eks-2-us-west-2-folio-eks-2-ng-us-west2a-Node
m5.xlarge832128folio-eks-2-us-west-2-folio-eks-2-ng-us-west2b-Node
m5d.xlarge, m5ad.xlarge, m5a.xlarge, m5.xlarge42168672folio-dev
m5d.xlarge, m5ad.xlarge, m5a.xlarge, m5.xlarge1768272folio-perf
m5d.xlarge, m5ad.xlarge, m5a.xlarge, m5.xlarge2080320folio-testing
m5.large3624rancher

Total count nodes and their CPU and memory.

Instance typecount instancestotal CPUtotal memoryclusters
m5d.xlarge, m5ad.xlarge, m5a.xlarge, m5.xlarge793161264folio-testing, folio-perf, folio-dev

For our investigation we will use next instance type as the base m5.xlarge, much more details about instance types that we are currently using you can get on this page RANCHER-582 Investigate memory optimised instances usage for Rancher clusters - Folio Development Teams - FOLIO Wiki.

Total cost for On-demand model will be 79 * 0,192 = 15,1668$ per hour

Rancher capacity for each cluster 

Information about capacity for each cluster I took from Rancher->Cluster

folio-dev

CoresMemoryPods
ReservedUsedReservedUsedUsed
63.72 / 164.6426.2 / 168592 / 600 GiB366 / 641 GiB1310 / 2436

folio-perf

CoresMemoryPods
ReservedUsedReservedUsedUsed
11.73 / 66.645.7 / 68 231 / 243 GiB116 / 259 GiB473 / 986

folio-testing

CoresMemoryPods
ReservedUsedReservedUsedUsed
25.18 / 78.47.21 / 80261 / 286 GiB150 / 306 GiB510 / 1160

Total

CoresMemoryPods
ReservedUsedReservedUsedUsed
100.63/309.6839.11/3161084/1129602/12062293/4582

Analyses approximately amount EC2 instances if we will move to r5a.xlarge and their cost

Aws count instances base on total Memory that we are using

Some technical information about "r5a.xlarge" and "m5.xlarge".

Type instance CPUMemoryNetwork Bandwidth (Gigabit)StorageProcessorprice for 1 hour for 1 instance
(On Demand)
r5a.xlarge43210EBS onlyAMD EPYC 7000 (AMD EPYC 7571)0,226
m5.xlarge41610EBS onlyIntel Xeon® Platinum 8175M0,192

Compare amount instances base on different type.


m5.xlarger5a.xlarge
clustercount instancestotal CPUtotal memorycount instancestotal CPUtotal memory
folio-dev421686722184672
folio-perf1768272936288
folio-testing20803201040320

Total amount of possible nodes for r5a.xlarge and their CPU and memory.

Instance typecount instancestotal CPUtotal memoryclusters
r5a.xlarge401601280folio-testing, folio-perf, folio-dev

Total cost for r5a.xlarge On-demand model will be 40* 0.226= 9.04$ per hour

Calculate total price for 1 year for different instance type and price models

Here you can find information about how many we should pay for different price modes and different amount instances.

For 1 instance for 1 year 




RISave plan

total count instancesOn Demandno upfrontPartial upfrontall upfrontno upfrontPartial upfrontall upfront
m5.xlarge11681,921059,961013,089891235,161173,841156,32
r5a.xlarge11979,761243,921189,6811641436,641366,281340,28

For amount instances that we got from previous investigations




RISave plan

total count instancesOn Demandno upfrontPartial upfrontall upfrontno upfrontPartial upfrontall upfront
m5.xlarge79132871,6883736,8480033,327813197577,6492733,3691349,28
r5a.xlarge4079190,449756,847587,24656057465,654651,253611,2


Conclusion

Type instance amount instancesprice for 1 hour for 1 instance
(On Demand)
total price for 1 hour
m5.xlarge79 0,19215,1668
r5a.xlarge400.2269.04


As result of this investigation, I can say that based on On-Demand price model we will save about 40% of our money because we decrease count of instances. that we need for deploy our clusters. 


As we can see from this paragraph "Rancher capacity for each cluster", we do not use most amount CPU cores, but using a lot of memory of our EC2 instances. So, if we move to memory optimize EC2 instance, we will save our money such as they have much more Memory and we will run less nodes.