Table of Contents |
---|
Overview
Bulk Edit Edits - Establish a performance baseline for Items bulk updates PERF-406 in the Orchid release that has architectural changes that were implemented in UXPROD-3842. The goal is to make sure the performance did not deteriorate in comparison to Nolana release. Some questions can help us to determine the performance and stability of the new Bulk Edits implementation:
- How long does it take to export 100, 1000, 10k, and 100K records?
- Use Can it for be used with up to 5 concurrent users. ?
- Run consecutively four jobs editing 10k item records
- Run simultaneously four jobs editing 10k item records
- Look for a memory trend and CPU usage
...
Test report for Bulk Edits items-app functionality 2023-03-08.
Orchid release works 30% faster for 10k items bulk editing than Nolana.
It is approximately the same stable as Nolana.
- For 1 concurrent job 100 records can be edited in 1 min 9 s which is 3 times slower than in Nolana, 1000 records editing could be performed in approximately the same time as Nolana(2 min 40 s), and 10k records bulk editing is about 30% faster.
- 10k records per user, 5 users simultaneously (50k records total) can be uploaded and edited in about 20 minutes which is about 8 min faster compared to Nolana (about 28 min).
- The memory of mod-inventory-storage was high at 109% but stable (It was 109% even before the test). No memory leaks were found.
- CPU for mod-users was up to 125% (5 concurrent jobs 10k records updating) increased compared to Nolana (was about 40% with the same configurations) Need to be investigated in further testing. For all other modules did not exceed 65% for all of the tests.
- For all records number (100, 1k,10k), and 5 concurrent jobs - RDS CPU utilization did not exceed 60%.
Recommendations & Jiras
More than 50% of jobs FAILED in about 28-33 min with the error "Connection reset (SocketException)".
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
From time to time Job FAILED with the error from the s3 bucket
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
The high CPU usage of mod-users (up to 125% ) needs to be investigated.
Results
Test Runs
1#
...
One (concurrent) Job
Number of records | Duration | Comments | ||||
---|---|---|---|---|---|---|
100 | 1 min 9 s | |||||
1000 | 2 min 36 s | |||||
10k | 17 min 50 s | |||||
50k | 1 hour 58 min | or Error in about 28-33 min Connection reset (SocketException)
|
| ||||||
100k | always FAILD | Error in about 28-33 min Connection reset (SocketException)
|
|
2#
2# Items App 10k records 3, 4, and 5 concurrent jobs
10k records for each job | |
Number of concurrent jobs | Duration |
---|---|
1 | 17 min 50 s |
3 | 18 min 50 s |
4 | 19 min 10 s |
5 | 20 min 20 s |
3#
...
Editing four jobs
...
consecutively with 10k item records each
Job # | Job duration (run 2) | Job duration (run 1) |
---|---|---|
1 | 17 min 47 s | 18 min 49 s |
2 | 17 min 53 s | 18 min 26 s |
3 | 17 min 45 s | 20 min 44 s |
4 | 18 min 5 s | ERROR occurs: We encountered an internal error. Please try again. (Service: S3, Status Code: 500, Request ID: 5W7F75FMHHH3KDWT, Extended Request ID: 36K8tkhFQHS1Mjt7sZc4jYrBduBWO/psei+33ZIIOnhrytq7Eie3mjDALtBplhZxSJv4CfrZpnw8Z6nqmz03ZB7b3yiRdecyXfZ/ZtEmN4g=) (S3Excepti |
4#
...
Items App 5 concurrent jobs
...
5 Concurrent Item Apps jobs
"BARCODE". Records number per 1 user | Orchid (Total |
Time) | Nolana (Total |
---|
Time) | Morning Glory (Total |
---|
Time) | |||
---|---|---|---|
100 | 1 min 10 s | 18 sec | 25-27 sec |
1000 | 2 min |
57 s | 3 min | 4 min | |||
---|---|---|---|---|---|
10k | 20 min 20 s | 28 min | 30 min | ||
25k | Results are not representative because of
|
| 1 hour 3 min | 50 min | ||||||
---|---|---|---|---|---|---|---|---|
50k | Results are not representative because of
|
| about 2 hours for successful jobs. | - |
---|
* "-" test was not performed due to errors that occurred
...
The memory of mod-inventory-storage was high at 109% but stable (It was 109% even before the test). No memory leaks were found.
...
RDS CPU utilization
Run #1
Run #2 & #3
Maximum RDS CPU utilization is 61% for 5 concurrent jobs with 10k item records.
The more concurrent jobs are running -the higher RDS CPU usage - it looks like it should handle up to 7 concurrent jobs without any issues. The maximum number of jobs will be investigated.
Appendix
Infrastructure
PTF -environment ncp5 [ environment name]
- 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
- 2 instances of db.r6.xlarge database instances: Writer & reader instances
- MSK ptf-kakfa-3 [ kafka configurations]
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Modules memory and CPU parameters:
Module | SoftLimit | XMX | Revision | Version | desiredCount | CPUUnits | RWSplitEnabled | HardLimit | Metaspace | MaxMetaspaceSize |
---|---|---|---|---|---|---|---|---|---|---|
mod-inventory-storage-b | 1952 | 1440 | 3 | mod-inventory-storage:26.1.0-SNAPSHOT.644 | 2 | 1024 | False | 2208 | 384 | 512 |
mod-inventory-b | 2592 | 1814 | 7 | mod-inventory:20.0.0-SNAPSHOT.392 | 2 | 1024 | False | 2880 | 384 | 512 |
okapi-b | 1440 | 922 | 1 | okapi:5.1.0-SNAPSHOT.1352 | 3 | 1024 | False | 1684 | 384 | 512 |
mod-users-b | 896 | 768 | 4 | mod-users:19.2.0-SNAPSHOT.584 | 2 | 128 | False | 1024 | 88 | 128 |
mod-data-export-worker | 2600 | 2048 | 3 | mod-data-export-worker:3.0.0-SNAPSHOT.104 | 2 | 1024 | False | 3072 | 384 | 512 |
mod-data-export-spring | 1844 | 1292 | 3 | mod-data-export-spring:2.0.0-SNAPSHOT.67 | 1 | 256 | False | 2048 | 200 | 256 |
mod-bulk-operations | 3864 | 0 | 8 | mod-bulk-operations:1.0.1 | 2 | 400 | False | 4096 | 384 | 512 |
mod-notes | 896 | 322 | 3 | mod-notes:5.1.0-SNAPSHOT.245 | 2 | 128 | False | 1024 | 128 | 128 |
mod-agreements | 2580 | 2048 | 3 | mod-agreements:5.6.0-SNAPSHOT.117 | 2 | 128 | False | 3096 | 384 | 512 |
nginx-okapi | 896 | 0 | 3 | nginx-okapi:2022.03.02 | 2 | 128 | False | 1024 | 0 | 0 |