/
PTF - Performance testing of CPU=0 and tasks placement strategy for Services (QCP1)

PTF - Performance testing of CPU=0 and tasks placement strategy for Services (QCP1)

Overview

  • In this report, PTF investigates the impact of setting CPU=0 and disabling the distinctInstance task placement strategy on FOLIO’s performance, with and without New Relic/OpenTelemetry enabled. Previous observations suggested that this configuration negatively affects system performance. The goal of these tests was to reproduce the issue and evaluate whether performance could be improved by adjusting CPU allocations and enabling distinctInstance placement. Through a series of experiments, we analyzed how these settings influence system behavior, ensuring that CPU allocation and task distribution strategies are optimized for stability and efficiency.

PERF-1071 - Getting issue details... STATUS  

Summary

  • The performance tests conducted in this report indicate that neither the distinctInstance placement strategy nor the use of New Relic/OpenTelemetry had a significant impact on system performance. Across all tests, performance variations remained within a 5% margin. The slight difference observed in Test №3 was attributed to a lower instance count rather than the placement strategy itself. Further tests (№4 and №5) confirmed these findings, showing consistent results regardless of the number of instances used. Additionally, disabling New Relic did not affect performance, suggesting that monitoring overhead was negligible. Overall, the experiments demonstrate that the tested configurations do not introduce meaningful performance differences.
  • It appears that increasing the number of virtual users in our test led to contention in the database, specifically due to a high volume of concurrent updates on the same row in the auth_attempts table by mod-login. The query:
    UPDATE fs09000000_mod_login.auth_attempts SET jsonb = $1::jsonb WHERE id='9883ca16-ef27-41f7-81d7-6693b79cddad'
    suggests that multiple sessions are attempting to modify the same record simultaneously, leading to row-level locking. As a result, transactions are waiting in the background for the lock to be released, potentially causing performance degradation or request timeouts.

Test Runs

Test #DescriptionStatus
Test 1New Relic/OpenTelemetry enabled, OTEL value for mod-inventory set TRUE, distinctInstance placement strategy turned ON, CPU values set for list of modulesCompleted
Test 2New Relic/OpenTelemetry enabled, OTEL value for mod-inventory set TRUE, distinctInstance placement strategy turned OFF, CPU values set for list of modulesCompleted
Test 3New Relic/OpenTelemetry enabled, OTEL value for mod-inventory set TRUE, distinctInstance placement strategy turned OFF, CPU values set 0 for all servicesCompleted
Test 4New Relic/OpenTelemetry enabled, OTEL value for mod-inventory and mod-circulation set TRUE, distinctInstance placement strategy turned OFF, CPU values set 0 for all servicesCompleted
Test 5New Relic/OpenTelemetry enabled, OTEL value for mod-inventory and mod-circulation set TRUE, distinctInstance placement strategy turned OFF, CPU values set for list of modulesCompleted
Test 6New Relic/OpenTelemetry disabled, distinctInstance placement strategy turned OFF, CPU values set 0 for all servicesCompleted
Test 7

New Relic/OpenTelemetry disabled, distinctInstance placement strategy turned OFF, CPU values set for list of modules

Completed

Test Results

This table contains response time for Check In\Check Out and RTAC tests

RequestsTest №1
Response time, ms

Test №2
Response time, ms

Test №3
Response time, ms

Test №4
Response time, ms

Test №5
Response time, ms

Test №6
Response time, ms

Test №7
Response time, ms


AverageAverageAverageAverageAverageAverageAverage
Check-In Controller9959651057886922929917

Check-Out Controller

1518150216371352139214311447

RTAC

1174113612581087106910851091



Test №1-2-3

Test №1: Modules have CPU = 0, and distinctInstance placement strategy is OFF.
Goal: Establish baseline performance metrics for comparison with subsequent configurations.

Test №2: Modules are assigned specific CPU values, with distinctInstance placement strategy still OFF.
Goal: Evaluate whether performance improves compared to the baseline (Test №1).

Test №3: Modules are assigned specific CPU values, and distinctInstance placement strategy is ON.
Goal: Assess whether enabling distinctInstance further enhances performance over the previous tests.

Results: Performance remained nearly the same across all tests, with differences of less than 5%. Test №3 showed a slight variance due to having 5 instances instead of 6, as used in Tests №1 and №2, indicating no negative impact from disabling the distinctInstance placement strategy.

Service CPU Utilization

Here we can see Test №1 with CPU=VALUE  and that mod-rtac module used 112% CPU power.

Here we can see Test №2 with CPU=VALUE  and that mod-rtac module used 109% CPU power.

Here we can see Test №3 with CPU=0 and that mod-nginx-okapi and okapi modules used 12% Instances CPU power.

Service Memory Utilization

Here we can see that all modules show a stable trend.


Kafka metrics

OpenSearch Data Nodes metrics

DB CPU Utilization

DB CPU was 85% average.

DB Connections

Max number of DB connections was 950.

DB load

                                                                                                                    

Top SQL-queries

Test №4-5

Goal: Repeat Test №2 and Test №3 to validate previous results.

Results: The results remained the same, despite Test №4 running with 5 instances and Test №5 with 6 instances, confirmed no negative impact from disabling the distinctInstance placement strategy.

Service CPU Utilization

Here we can see Test №4 with CPU=0 and that mod-nginx-okapi and okapi modules used 10% Instances CPU power.

Here we can see Test №5 with CPU=VALUE  and that mod-rtac module used 131% CPU power.

Service Memory Utilization

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.


Kafka metrics


OpenSearch Data Nodes metrics

DB CPU Utilization

DB CPU was 85%

DB Connections

Max number of DB connections was 1600.

DB load

                                                                                                                     

Top SQL-queries


Test №6-7

Goal: Disable New Relic and repeat Test №4 and Test №5 to observe any effects.

Results: The results remained the same compared to all previous tests, showing no impact from using or not using New Relic.

Service CPU Utilization

Here we can see Test №6 with CPU=0 and that mod-nginx-okapi and okapi modules used 12% Instances CPU power.

Here we can see Test №7 with CPU=VALUE  and that mod-rtac module used 101% CPU power.

Service Memory Utilization

Here we can't see any sign of memory leaks on every module. Memory shows stable trend.


Kafka metrics

OpenSearch Data Nodes metrics

DB CPU Utilization

DB CPU was 85% maximum.

DB Connections

Max number of DB connections was 1050.

DB load

                                                                                                                   

Top SQL-queries

Appendix

Infrastructure

PTF - QCP1 environment configuration (was changed during testing)

  • 5-6 r7g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database  instance, writer

    NameMemory GIBvCPUs

    db.r6g.xlarge

    32 GB4 vCPUs
  • Open Search ptf-test 
    • Data nodes
      • Instance type - r6g.2xlarge.search
      • Number of nodes - 4
      • Version: OpenSearch_2_7_R20240502
    • Dedicated master nodes
      • Instance type - r6g.large.search
      • Number of nodes - 3
  • MSK fse-tenant
    • brokers, kafka.m7g.xlarge brokers in 2 zones
    • Apache Kafka version 3.7.x 

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


 qcp1 modules memory and CPU parameters

Cluster Resources

qcp1-pvt (Thu Feb 20 09:58:24 UTC 2025)

ModuleTask Definition RevisionModule VersionTask CountMem Hard LimitMem Soft LimitCPU UnitsXmxMetaspace SizeMax Metaspace Size
mod-remote-storage135mod-remote-storage:3.2.024920447203960512512
mod-finance-storage135mod-finance-storage:8.6.121024896070088128
mod-sudoc55mod-sudoc:1.021024896076888128
mod-ebsconet135mod-ebsconet:2.2.02124810240700128256
edge-sip2115edge-sip2:3.2.721024896076888128
mod-tags135mod-tags:2.2.021024896076888128
edge-courses135edge-courses:1.4.521024896076888128
mod-authtoken185mod-authtoken:2.15.2214401152092288128
mod-inventory-update135mod-inventory-update:3.3.121024896076888128
mod-notify135mod-notify:3.2.021024896076888128
mod-configuration135mod-configuration:5.10.021024896076888128
edge-caiasoft135edge-caiasoft:2.2.521024896076888128
mod-login-saml135mod-login-saml:2.8.421024896076888128
mod-licenses135mod-licenses:6.0.522480231201792384512
mod-gobi135mod-gobi:2.8.121024896070088128
mod-graphql155mod-graphql:1.12.221024896076888128
mod-erm-usage135mod-erm-usage:4.8.022800255001800384512
mod-batch-print145mod-batch-print:1.2.021024896076888128
mod-copycat135mod-copycat:1.6.021024512076888128
mod-entities-links145mod-entities-links:3.0.22259224800144001024
pub-edge85pub-edge:2023.06.1421024896076800
mod-orders145mod-orders:12.8.1022048174001024384512
edge-patron145edge-patron:5.1.221024896076888128
edge-ncip135edge-ncip:1.10.121024896076888128
edge-inn-reach95edge-inn-reach:3.1.1-SNAPSHOT.4521024896076888128
mod-users-bl135mod-users-bl:7.7.5214401152092288128
mod-oa115mod-oa:2.1.0-SNAPSHOT.6221024896076888128
mod-inventory-storage175mod-inventory-storage:27.1.544096369003076384512
mod-invoice155mod-invoice:5.8.3214401152092288128
mod-user-import135mod-user-import:3.8.121024896076888128
mod-sender145mod-sender:1.12.021024896076888128
mod-data-export-worker135mod-data-export-worker:3.2.423072280002048384512
mod-circulation-storage155mod-circulation-storage:17.2.642880259201814384512
mod-calendar135mod-calendar:3.1.0220481536076888128
mod-event-config135mod-event-config:2.7.121024896076888128
mod-courses135mod-courses:1.4.1121024896076888128
mod-circulation-item135mod-circulation-item:1.0.0210248960000
mod-email135mod-email:1.17.022800255001800384512
mod-pubsub135mod-pubsub:2.13.12153614400922384512
mod-circulation195mod-circulation:24.2.642880259201814384512
mod-di-converter-storage135mod-di-converter-storage:2.2.321024896076888128
edge-rtac135edge-rtac:2.7.321024896076888128
edge-orders135edge-orders:3.0.321024896076888128
mod-users145mod-users:19.3.221024896076888128
mod-template-engine135mod-template-engine:1.20.021024896076888128
mod-audit135mod-audit:2.9.021024896076888128
mod-source-record-manager155mod-source-record-manager:3.8.725600500003500384512
mod-quick-marc155mod-quick-marc:5.1.112288217601664384512
nginx-okapi85nginx-okapi:2023.06.14210248960000
mod-feesfines145mod-feesfines:19.1.021024896076888128
edge-users35edge-users:1.5.021024896076888128
mod-dcb125mod-dcb:1.1.021024896076888128
mod-service-interaction135mod-service-interaction:4.0.222048184401290384512
mod-patron155mod-patron:6.1.021024896076888128
edge-connexion135edge-connexion:1.3.121024896076888128
mod-data-export-spring135mod-data-export-spring:3.2.212048184401536384512
mod-organizations-storage135mod-organizations-storage:4.7.021024896070088128
mod-login145mod-login:7.11.22144012980768384512
edge-erm35edge-erm:1.2.121024896076888128
mod-ncip135mod-ncip:1.14.521024896076888128
mod-agreements135mod-agreements:7.0.102159214880000
mod-organizations135mod-organizations:1.9.221024896070088128
mod-consortia95mod-consortia:1.1.0251364776044165121024
mod-serials-management135mod-serials-management:1.0.422480231201792384512
mod-settings135mod-settings:1.0.321024896076888128
mod-data-import195mod-data-import:3.1.112048184401292384512
edge-dematic135edge-dematic:2.2.511024896076888128
mod-search135mod-search:3.2.7225922480014405121024
mod-inn-reach95mod-inn-reach:3.2.0-SNAPSHOT.86236003240028805121024
edge-inventory35edge-inventory:1.6.121024896076888128
mod-orders-storage145mod-orders-storage:13.7.421024896070088128
mod-erm-usage-harvester135mod-erm-usage-harvester:4.5.021024896076888128
mod-password-validator135mod-password-validator:3.2.12144012980768384512
mod-bulk-operations135mod-bulk-operations:2.0.223072260001536384512
mod-fqm-manager135mod-fqm-manager:2.0.521024896076888128
edge-dcb115edge-dcb:1.1.021024896076888128
mod-finance135mod-finance:4.9.121024896070088128
mod-lists135mod-lists:2.0.621024896076888128
mod-permissions235mod-permissions:6.5.021684154401024384512
mod-marc-migrations105mod-marc-migrations:1.0.0-SNAPSHOT.521024896076888128
edge-ea-data-export35edge-ea-data-export:4.2.021024896076888128
edge-oai-pmh135edge-oai-pmh:2.9.221512136001440384512
mod-rtac155mod-rtac:3.6.121024896076888128
mod-task-list35mod-task-list:1.9.421024896076888128
mod-source-record-storage285mod-source-record-storage:5.8.1025600500003500384512
mod-inventory195mod-inventory:20.2.1142880259201814384512
mod-patron-blocks135mod-patron-blocks:1.10.021024896076888128
edge-fqm135edge-fqm:2.0.321024896076888128
nginx-edge85nginx-edge:2023.06.14210248960000
okapi-b115okapi:5.3.03168414400922384512
mod-invoice-storage135mod-invoice-storage:5.8.221872153601024384512
mod-data-export285mod-data-export:5.0.412592248001440881024
mod-oai-pmh135mod-oai-pmh:3.13.224096369003076384512
mod-kb-ebsco-java135mod-kb-ebsco-java:4.0.021024896076888128
mod-notes135mod-notes:5.2.0210248960952384512
pub-okapi85pub-okapi:2023.06.1421024896076800
mod-eusage-reports115mod-eusage-reports:2.1.121024896076888128


Methodology/Approach

CICO Tests scenarios were started for 100 users and concurrently RTAC for 10 users by JMeter script from load generator. 

QCP1 configuration and steps to configure ECS infrastructure:

  • Instance type r7g.2xlarge

  • When New Relic/Open Telementry enabled OTEL value in mod-inventory's task definition set to true (OTEL_INSTRUMENTATION_RUNTIME_TELEMETRY_ENABLED=true)

  • mod-oa-b service was turn off for all tests
  • during testing CPU value was applied for list of modules:
    • mod-inventory: 3072
    • mod-inventory-storage: 2048
    • mod-circulation: 1536
    • mod-circulation-storage: 1536
    • mod-feesfines: 256
    • mod-orders: 1024
    • mod-orders-storage: 512
    • mod-login: 1024
    • mod-source-record-storage: 2048
    • mod-rtac: 128
    • mod-patron: 128


Test 1:
The QCP1 environment was configured with 6 Instances, New Relic/OpenTelemetry enabled, OTEL value for mod-inventory set TRUE, distinctInstance placement strategy turned ON, CPU values set for list of modules, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test 2:
The QCP1 environment was configured with with 6 Instances, New Relic/OpenTelemetry enabled, OTEL value for mod-inventory set TRUE, distinctInstance placement strategy turned OFF, CPU values set for list of modules, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test 3:
The QCP1 environment was configured with with 5 Instances, New Relic/OpenTelemetry enabled, OTEL value for mod-inventory set TRUE, distinctInstance placement strategy turned OFF, CPU values set 0 for all services, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test 4:
The QCP1 environment was configured with with 5 Instances, New Relic/OpenTelemetry enabled, OTEL value for mod-inventory and mod-circulation set TRUE, distinctInstance placement strategy turned OFF, CPU values set 0 for all services, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test 5:
The QCP1 environment was configured with with 6 Instances, New Relic/OpenTelemetry enabled, OTEL value for mod-inventory and mod-circulation set TRUE, distinctInstance placement strategy turned OFF, CPU values set for list of modules, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test 6:
The QCP1 environment was configured with with 5 Instances, New Relic/OpenTelemetry disabled, distinctInstance placement strategy turned OFF, CPU values set 0, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test 7:
The QCP1 environment was configured with with 6 Instances, New Relic/OpenTelemetry disabled, distinctInstance placement strategy turned OFF, CPU values set for list of modules, 4 tasks assigned to mod-inventory, mod-inventory-storage, mod-circulation, and mod-circulation-storage, with Check-In/Check-Out tests running 100 users and RTAC tests running 10 users concurrently on a main tenant for 35 minutes.

Test artifacts:


Related content