Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinetrue

...

Compare results against the baseline tests and record the KPIs and other observations such as response times or errors in a report. 

Summary

  • db.r6g.xlarge DB size can not handle load of 61 tenants even with test of 1 user for CechIn-CheckOut workflow. Due to high number of connection being created. To handle this number of concurrent connections to DB - instance type should be at least db.r6g.4xlarge (it has acceptable number of connections). However even that could be not enough. In latest tests we can see that db.r6g.4xlarge os oftenly reaching a limit of connections available. Possibly Shared pool of connections will have a positive affect on this.
  • Ticket created on performance degradation  MODINVSTOR-1124
  • nginx-okapi in combined tests spiking up to 400% so CPU units should be increased (at least up to 512.(currently it's 128));
  • kafka CPU usage is on ±60% level during whole test. (in waiting state it's 35-40%). Increasing number of Kafka brokers (+2) has positive affect on data import, however while DI performance has being improved CICO being affected. As we observed - higher throughput on DI load DB more and has negative affect on response times of CICO
    • io.vertx.core.impl.NoStackTraceThrowable: Connection is not active now, current status: CLOSED
    • or io.netty.channel.StacklessClosedChannelException
  • DI first job (typically it's primary tenant) working fastest. each next tenant working slower and up to 3 hr.
  • In last test 5 DI jobs completed with errors due to same issues mentioned above. One job not even started due to 500 internal server error on POST call to start a job. 
  • OpenSearch CPU usage is on 90% during whole test. It is likely due to DI jobs requiring indexing on each record created. This indexing is done asynchronously so it does not affect overall DI's duration, but likely affects other workflows' performance
  • No memory leaks was found
  • Improvements (adding 2 more brokers to Kafka cluster and changing CPU units on nginx-edge to 512) in test #7-8 did make Data Import faster, however they did affect CICO as well and didi increase response times on CI and CO +200ms avg. 

Recommendations & Jiras

  • Original ticket - PERF-639 Preliminary Testing of Mobius-like Env;
  • Ticket to improve resources and retest PERF-670
  • Recommended to increase DB instance type at least to db.r6g.4xlarge on env with 61 tenants (all tests below performed with this instance type)
  • Recommended to increase CPU units at least to 512 on nginx-okapi;
  • Recommended to scale Kafka (either instance type or number of brokers) due to high CPU usage;

...

Test #

Test Conditions

Duration 

Load generator size Load generator Memory(GiB)

Notes

1.2 tenants 2 user each CICO30 mint3.2xlarge3
2.61 tenants 5 user each CICO30 mint3.2xlarge3
361 tenants 5 user each CICO + 10k MARC BIB Create on 5 tenants30 mint3.2xlarge3

4.

5 users CI/CO on 61 tenants + DI 10k MARC BIB Create on 15 tenants+ Search workflow 1 user 61 tenants60 minst3.2xlarge3

5.

5 users CI/CO on 61 tenants + DI 10k MARC BIB Create on 15 tenants+ Search workflow 1 user 61 tenants60 minst3.2xlarge3
65 users CI/CO on 61 tenants + DI 10k MARC BIB Create on 30 tenants+ Search workflow 1 user 61 tenants90 minst3.2xlarge12
75 users CI/CO on 61 tenants + DI 10k MARC BIB Create on 15 tenants+ Search workflow 1 user 61 tenants60t3.2xlarge10test with changed CPU units up to 512 and adding 2 more brokers to Kafka
85 users CI/CO on 61 tenants + DI 10k MARC BIB Create on 15 tenants+ Search workflow 1 user 61 tenants (retest)60t3.2xlarge10test with changed CPU units up to 512 and adding 2 more brokers to Kafka

...

  • io.vertx.pgclient.PgException: FATAL: remaining connection slots are reserved for non-replication superuser connections (53300) – on mod-inventory, mod-login, mod-authtoken, mod-permittions, mod-source-record-manager, mod-source-record-storage, mod-users, mod-circulation. 
  • io.vertx.pgclient.PgException: FATAL: sorry, too many clients already (53300) -- on mod-inventory only


...

PTF environment ompt-pvt

  • 11 m6g.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 instance of db.r6.4xlarge database 
  • MSK ptf-mobius-testing
    • 2 kafka.m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 250 GiB

    • auto.create.topics.enable=true
    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • num.partitions=2
  • OpenSearch fse
    • version - OpenSearch 2.7
    • instance type r6g.xlarge.search
    • 4 data nodes
    • EBS volume 500 GiB
    • Dedicated Master nodes 3 X r6g.large.search


Modules memory and CPU parameters

...