CPU units deployment recommendations

CPU units deployment recommendations

Objectives

PERF-1213: Test CPU settings for all modules and sidecars for Sunflower releaseClosed

To check if system could fail with heavily used tasks placed on the same EC2 instance when tasks' CPU is set to 0

To check performance consistency with various CPU units values for modules and sidecars

To recommend CPU values on sidecars and modules

Assumptions

  • AWS ECS' CPU unit serves only for reservation and as a task placement constraint https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-tasksize.html?utm_source=chatgpt.com. Because it’s a soft limit performance shouldn’t be affected by any CPU units value.

  • The only possible effect on performance may be from random tasks placement on EC2 instances, as few key heavily used modules may be placed on the same instance and consume most of the instance’s CPU. Because of this random but unfortunate placement of heavily used modules and sidecars, the hosting EC2 instance may reach 100% of CPU usage and performance would degrade.

Investigation

Placing “heavily-used“ tasks on the same EC2 instance

In order to check if placing of multiple heavy may cause EC2 instance to reach 100% of CPU under load PTF did woke up environment manually in the following order:

  • Start DB

  • Set Automatic Scaling Group (ASG) to have max capacity of 3 EC2 instances.

  • Manually adjust task counts on the heavily used modules to be on these 3 instances: mod-users, sidecar-mod-users, kong, sidecar-mod-users-bl, mod-users-bl, sidecar-mod-consortia-keycloak, mod-consortia-keycloak, sidecar-mod-source-record-manager, mod-source-record-manager, mod-search, sidecar-mod-search, folio-keycloak, mgr-tenant-entitlements, mod-circulation-bff, sidecar-mod-circulation-bff, mod-inventory, sidecar-mod-inventory, edge-users, mgr-tenants, mgr-applications, sidecar-mod-roles-keycloak, mod-roles-keycloak, sidecar-mod-circulation-item, mod-circulation-item, mod-login-keycloak, sidecar-mod-login-keycloak, sidecar-mod-users-keycloak, mod-users-keycloak, mod-source-record-storage, sidecar-mod-source-record-storage, mod-entities-links, sidecar-mod-entities-links, mod-configuration, sidecar-mod-configuration, sidecar-mod-permissions, mod-permissions, sidecar-mod-settings, mod-settings, sidecar-mod-circulation, mod-circulation, sidecar-mod-feesfines, mod-feesfines, mod-inventory-storage, sidecar-mod-inventory-storage, sidecar-mod-circulation-storage, mod-circulation-storage.

  • Finally trigger “adjust tasks count“ job on Jenkins to place the remaining tasks on other EC2 instances.

With this setup PTF ran multiple workflow tests to stress system.

During all tests the CPU usage didn't exceed 70% (during short spike) and almost during all tests it was 50-60%.

image-20251106-102724.png

Moreover PTF have checked ec2 CPU usage on most of prod. clusters in us-east-1. For most of clusters CPU usage didn’t exceed 40-50% (except two clusters where CPU on one of ec2 instances was ±95% due to bug in mod-email).

Conclusion

Even with purposely placed “heavy“ tasks on few EC2 instances and under load, the instances were not able to reach 100% CPU or crash the system.

Performance check with various CPU units

According to https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-tasksize.html?utm_source=chatgpt.com performance shouldn’t be affected by “small changes“ of CPU units (0,64,128).

Results of testing here

Conclusion

Based on testing results performance fluctuate between runs and doesn’t show boost or degradation for CPU units 0, 64, 128.

CPU values recommendations

In some cases when something goes wrong and an ec2 instance could reach critical level of CPU usage - having set CPU value for services and sidecars will ensure that some CPU time will be reserved for this containers, keeping the system stable.

As a conclusion from all tests and investigation the PTF recommends to use as default 64 CPU units on all services and sidecars except for the following key modules that need more CPU allocated than others:

  • folio-keycloak: 1536

  • mod-consortia-keycloak: 512

  • mod-login-keycloak: 128

  • mod-roles-keycloak: 128

  • mod-users-keycloak: 128

  • Sidecars' CPU = 64 except for: mod-inventory and mod-inventory-storage: 512

  • All modules' task placement strategies: Spread by Availability Zone and InstanceId