Bulk Edit Items App report [Orchid] 08/03/2023

Overview

Bulk Edits - Establish a performance baseline for Items bulk updates PERF-406 in the Orchid release that has architectural changes that were implemented in UXPROD-3842. The goal is to make sure the performance did not deteriorate in comparison to Nolana release.  Some questions can help us to determine the performance and stability of the new Bulk Edits implementation:

  • How long does it take to export 100, 1000, 10k, and 100K records?
  • Can it be used with up to 5 concurrent users? 
  • Run consecutively four jobs editing 10k  item records
  • Run simultaneously four jobs editing 10k item records
  • Look for a memory trend and CPU usage

Summary 

Test report for Bulk Edits items-app functionality 2023-03-08. 

Orchid release works 30% faster for 10k items bulk editing than Nolana.

It is approximately the same stable as Nolana.

  • For 1 concurrent job 100 records can be edited in 1 min 9 s which is 3 times slower than in Nolana, 1000 records editing could be performed in approximately the same time as Nolana(2 min 40 s), and 10k records bulk editing is about 30% faster.
  • 10k records per user, 5 users simultaneously (50k records total) can be uploaded and edited in about 20 minutes which is about 8 min faster compared to Nolana (about 28 min).
  • The memory of mod-inventory-storage was high at 109% but stable (It was 109% even before the test). No memory leaks were found.
  • CPU for mod-users was up to 125% (5 concurrent jobs 10k records updating) increased compared to Nolana (was about 40% with the same configurations) Need to be investigated in further testing. For all other modules did not exceed 65% for all of the tests.
  • For all records number (100, 1k,10k), and 5 concurrent jobs - RDS CPU utilization did not exceed 60%.


Recommendations & Jiras

More than 50% of jobs FAILED in about 28-33 min with the error "Connection reset (SocketException)". PERF-334 - Getting issue details... STATUS

From time to time Job FAILED with the error from the s3 bucket MODBULKOPS-76 - Getting issue details... STATUS

The high CPU usage of mod-users (up to 125% ) needs to be investigated.

Results

Test Runs

1# One (concurrent) Job
Number of recordsDurationComments
1001 min 9 s
10002 min 36 s
10k17 min 50 s
50k1 hour 58 min

or Error in about 28-33 min Connection reset (SocketException) PERF-334 - Getting issue details... STATUS

100kalways FAILD

Error in about 28-33 min Connection reset (SocketException) PERF-334 - Getting issue details... STATUS

2#  Items App 10k records 3, 4, and 5 concurrent jobs
10k records for each job
Number of concurrent jobsDuration
117 min 50 s
318 min 50 s
419 min 10 s
520 min 20 s
3# Editing four jobs consecutively with 10k item records each
Job #Job duration (run 2)Job duration (run 1)
117 min 47 s18 min 49 s
217 min 53 s18 min 26 s
317 min 45 s20 min 44 s
418 min 5 s

ERROR occurs:

We encountered an internal error. Please try again. (Service: S3, Status Code: 500, Request ID: 5W7F75FMHHH3KDWT, Extended Request ID: 36K8tkhFQHS1Mjt7sZc4jYrBduBWO/psei+33ZIIOnhrytq7Eie3mjDALtBplhZxSJv4CfrZpnw8Z6nqmz03ZB7b3yiRdecyXfZ/ZtEmN4g=) (S3Excepti

4# 5 Concurrent Item Apps jobs
"BARCODE". Records number per 1 userOrchid (Total Time)Nolana (Total Time)Morning Glory (Total Time)
1001 min 10 s18 sec25-27 sec
10002 min 57 s3 min4 min
10k20 min 20 s28 min30 min
25k

Results are not representative because of
Error in about 28-33 min Connection reset (SocketException) PERF-334 - Getting issue details... STATUS

1 hour 3 min50 min
50k

Results are not representative because of
Error in about 28-33 min Connection reset (SocketException) PERF-334 - Getting issue details... STATUS

about 2 hours for successful jobs.-

 * "-" test was not performed due to errors that occurred

Memory usage

For all test runs

The memory of mod-inventory-storage was high at 109% but stable (It was 109% even before the test). No memory leaks were found.


Instance CPU utilization

Run #1

Run #2 & #3

Service CPU utilization

Run #1

Run #2 & #3

RDS CPU utilization

Run #1




Run #2 & #3

Maximum RDS CPU utilization is 61% for 5 concurrent jobs with 10k item records.

The more concurrent jobs are running -the higher RDS CPU usage - it looks like it should handle up to 7 concurrent jobs without any issues. The maximum number of jobs will be investigated. 

Appendix

Infrastructure

PTF -environment ncp5 [ environment name] 

  • 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
  • 2 instances of db.r6.xlarge database instances: Writer & reader instances
  • MSK ptf-kakfa-3 [ kafka configurations]
    • 4 kafka.m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


Modules memory and CPU parameters:

Module

SoftLimit

XMX

Revision

Version

desiredCount

CPUUnits

RWSplitEnabled

HardLimit

Metaspace

MaxMetaspaceSize

mod-inventory-storage-b195214403mod-inventory-storage:26.1.0-SNAPSHOT.64421024False2208384512
mod-inventory-b259218147mod-inventory:20.0.0-SNAPSHOT.39221024False2880384512
okapi-b14409221okapi:5.1.0-SNAPSHOT.135231024False1684384512
mod-users-b8967684mod-users:19.2.0-SNAPSHOT.5842128False102488128
mod-data-export-worker
2600
2048
3mod-data-export-worker:3.0.0-SNAPSHOT.1042
1024
False
3072
384
512
mod-data-export-spring
1844
1292
3mod-data-export-spring:2.0.0-SNAPSHOT.671
 256
False
2048
200
256
mod-bulk-operations386408mod-bulk-operations:1.0.12400False4096384512
mod-notes
896
322
3mod-notes:5.1.0-SNAPSHOT.2452
128
False
1024
128
128
mod-agreements
2580
2048
3mod-agreements:5.6.0-SNAPSHOT.1172
128
False
3096
384
512
nginx-okapi
896
03
nginx-okapi:2022.03.02
2
128
False
1024
00