Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outline
Table of Contents
outlinetrue

Overview

  • In this report, PTF investigates the impact of setting CPU allocations to 0 units across all tasks within an AWS ECS cluster. The purpose of this study is to determine whether removing CPU constraints reveals the actual CPU usage of the tasks and to assess how this adjustment affects overall performance. By comparing key workflows across different environments, we aim to identify any potential changes in efficiency, throughput, or resource utilization that may result from setting CPU = 0. The findings from these tests will help inform best practices for resource allocation and performance optimization within ECS clusters.

...

WorkflowTest 1
Baseline Env
configuration
Test 2
CPU=0
Test 3
x2gd.xlarge
CPU=0
instances=6
Test 4
x2gd.xlarge
CPU=0
instances=8
Test 5
x2gd.large
CPU=0
instances=10
Test 6
r6g.xlarge
CPU=2
instances=12(14) 
Test 7
r6g.xlarge
CPU=2
instances=14
placement strategy
(one task per host)
Test 8
x2gd.large
CPU=2
instances=14
placement strategy
(one task per host)
Average
(milliseconds)
ErrorsAverage
(milliseconds)
ErrorsAverage
(milliseconds)
ErrorsAverage
(milliseconds)
ErrorsAverage
(milliseconds)
ErrorsAverage
(milliseconds)
ErrorsAverage
(milliseconds)
ErrorsAverage
(milliseconds)
Errors
AIE_TC: Create Invoices7682100%8106100%50468100%15554100%92805100%10191100%9716100%91611100%
AIE_TC: Invoices Approve2818100%3016100%24781100%7720100%51554100%4512100%4210100%51626100%
AIE_TC: Paying Invoices2882100%3164100%32994100%9788100%64642100%5430100%4902100%65029100%
CICO_TC_Check-In Controller20000%22140%190750%60380%329090%36590%31940%308310%
CICO_TC_Check-Out Controller35480%37990%292241%105600%573470%64390%55950%555030%
CSI_TC:Share local instance1305019%1307319%1426516%161920%173612%1378214%1333917%174601%
DE_Exporting MARC Bib records custom workflow847070%590990%38249156%1168690%56738983%1278850%885410%50075971%
DE_Exporting MARC Bib records workflow741260%430930%41798257%976564%62361091%821260%830460%45721258%
EVA_TC: View Account5190%5650%1603419%17360%406192%11381%9721%376691%
ILR_TC: Create ILR14220%14990%116071%38820%236760%24140%19970%223350%
MSF_TC: mod search by auth query7552%6680%18880%15000%61800%10800%9100%58940%
MSF_TC: mod search by boolean query2051%1590%5940%4270%20220%2580%2380%18710%
MSF_TC: mod search by contributors4401%3940%8560%8390%34540%6160%5300%33000%
MSF_TC: mod search by filter query3020%2840%5120%5390%20180%4160%3620%19240%
MSF_TC: mod search by keyword query3100%2800%5210%5370%20130%4160%3610%19120%
MSF_TC: mod search by subject query4480%4060%7970%7780%30760%6200%5190%29170%
MSF_TC: mod search by title query10901%10250%14350%13870%36870%14320%11490%35430%
OPIH_/oai/records53300%54040%93300%76770%68810%33270%69900%83350%
POO_TC: Add Order Lines521420%541930%2821920%792060%4120040%577350%577490%3999170%
POO_TC: Approve Order406560%425230%2117470%564460%2651670%439300%438340%2759350%
POO_TC Create Order307340%316520%1073180%429400%492340%321210%438340%1756430%
RTAC_TC: edge-rtac37350%38280%162950%13870%571950%42050%39500%555950%
SDIC_Single Record Import (Create)1327919%1389419%4502416%173050%790532%1465014%1453117%742011%
SDIU_Single Record Import (Update)184660%194320%218270100%282070%1183990%217360%209650%1157770%
TC: Receiving-an-Order-Line43765100%46104100%218270100%65242100%325230100%49024100%48538100%322267100%
Serials-Receiving-Workflow45694100%47336100%198116100%68545100%302203100%49873100%50028100%295508100%
Unreceiving-a-Piece7823100%7757100%40059100%13335100%64155100%9018100%8717100%60295100%
ULR_TC: Users loan Renewal Transaction28100%30780%226021%78290%383830%46730%40300%361430%

...

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.



Kafka metrics


DB CPU Utilization

DB CPU was 99% average with ERW: Exporting Receiving Information

...

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.



Kafka metrics





DB CPU Utilization

DB CPU was 99% average with ERW: Exporting Receiving Information

...

Resource utilization for Test №3

The Baseline MCPT Environment configuration was applied, the instance type was changed to x2gd.xlarge, the number of instances was changed to 6, and CPU=0 was set for all services.

Service CPU Utilization

Here we can see that mod-permissions used 20% of the absolute CPU power of the container instance.

...

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.



Kafka metrics






DB CPU Utilization

DB CPU was 64% maximum.

...

Resource utilization for Test №4

The Baseline MCPT Environment configuration was applied, the instance type was changed to x2gd.xlarge, the number of instances was changed to 8, and CPU=0 was set for all services.

Service CPU Utilization

Here we can see that okapi used 20% of the absolute CPU power of the container instance.

...

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.



Kafka metrics






DB CPU Utilization

DB CPU was 91%.

...

Resource utilization for Test №5

The Baseline MCPT Environment configuration was applied, the instance type was changed to x2gd.large, the number of instances was changed to 10, and CPU=0 was set for all services.TT

Service CPU Utilization

...

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.



Kafka metrics




DB CPU Utilization

DB CPU was 42%.

...

Resource utilization for Test №6

The Baseline MCPT Environment configuration was applied, the instance type was changed to r6g.xlarge, the number of instances was changed to 14 but 12 were used, and CPU=2 was set for all services.

Service CPU Utilization

Here we can see that okapi used 46000% CPU of unit power.

...

Inctanse CPU Utilization


Kafka metrics






DB CPU Utilization

DB CPU was 98%.

...

Resource utilization for Test №7

The Baseline MCPT Environment configuration was applied, the instance type was changed to r6g.xlarge, the number of instances was changed to 14, placement strategy was updated to "one task per host", and CPU=2 was set for all services.

Service CPU Utilization

Here we can see that okapi used 44000% of the unit CPU power.


Service Memory Utilization

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.


...

Inctanse CPU Utilization


Kafka metrics






DB CPU Utilization

DB CPU was 98%.


DB Connections

Max number of DB connections was 5150.


DB load

                                                                                                                     

Top SQL-queries



Resource utilization for Test №8

The Baseline MCPT Environment configuration was applied, the instance type was changed to x2gd.large, the number of instances was changed to 14, placement strategy was updated to "one task per host", and CPU=2 was set for all services.

Service CPU Utilization

Here we can see that okapi used 38% of the unit CPU power.


Service Memory Utilization

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.



Kafka metrics






DB CPU Utilization

DB CPU was maximum 53%.


DB Connections

Max number of DB connections was 3842.


DB load

                                                                                                                     

Top SQL-queries



...

PTF - Baseline MCPT environment configuration

  • 14 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database  instance, writer


    NameMemory GIBvCPUs

    db.r6g.4xlarge

    128 GiB16 vCPUs


  • Open Search ptf-test 
    • Data nodes
      • Instance type - r6g.2xlarge.search
      • Number of nodes - 4
      • Version: OpenSearch_2_7_R20240502
    • Dedicated master nodes
      • Instance type - r6g.large.search
      • Number of nodes - 3
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

...

MOBIUS Tests: scenarios were started by JMeter script from load generator. We had 100% error count for AIE_TC: Create Invoices, AIE_TC: Invoices Approve, AIE_TC: Paying Invoices, TC: Receiving-an-Order-Line, Unreceiving-a-Piece and Unreceiving-a-Piece Workflows because data was not regenerated.  

...

Baseline MCPT Environment configuration according to tunning environment from previous report task count: 4 for services: mod-permissions, mod-search, mod-patron, mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage, mod-order, mod-order-storage, mod-invoice, mod-invoice-storage, for

...

 mod-users and mod-authtoken task count 6. Parameter srs.marcIndexers.delete.interval.seconds=86400 for mod-source-record-storage. Instance type: m6g.2xlargeInstances count: 14Database r6g.4xlargeAmazon OpenSearch Service  ptf-testr6g.2хlarge.search (4 nodes).

  • Test 1: The Baseline MCPT Environment configuration was applied, and CPU=0 was set for all modules,  Fixed Load (average case) MOBIUS test was run.
  • Test 2: The Baseline MCPT Environment configuration was applied, and CPU=0 was set for all modules,  Fixed Load (high load case) MOBIUS test was run.
  • Test 3: The Baseline MCPT Environment configuration was applied, and CPU=0 was set for all modules,  Fixed Load (average case) MOBIUS test was run (rerun Test 1).