Overview
Bulk Edits - Establish a performance baseline for Items bulk updates PERF-408 in the Orchid release that has architectural changes that were implemented in UXPROD-3842. The goal is to make sure the performance did not deteriorate in comparison to Nolana release. Some questions can help us to determine the performance and stability of the new Bulk Edits implementation:
- How long does it take to export 100, 1000, 10k, and 100K records?
- Can it be used with up to 5 concurrent users?
- Run consecutively four jobs editing 10k item records
- Run simultaneously four jobs editing 10k item records
- Look for a memory trend and CPU usage
Summary
Test report for Bulk Edits holdings-app functionality 2023-03-20.
Orchid release works about 40% slower for holdings bulk editing than Nolana. One of the possible root causes of performance degradation could be a too long time to get a preview of changes. - MODBULKOPS-86Getting issue details... STATUS
It is approximately the same stable as Nolana.
- For 1 concurrent job, 100 records can be edited in 1 min 9 s which is 19 s slower compared to Nolana (50 s), and 1000 records editing could be performed in 2 min 54 s which is 40 s slower compared to Nolana(2 min 10 s), and 10k records bulk editing is about 36% slower.
- 10k records per user, 5 users simultaneously (50k records total) can be uploaded and edited in about 22 minutes which is about 9 min 30 s slower compared to Nolana (about 12 min 25 s). Slowness Could be a result of the changes UXPROD-3842 and - MODBULKOPS-86Getting issue details... STATUS
- The memory utilization of mod-bulk operation increases from 20% to 23% (The service was updated before the test, probably it is reaching a steady state- the memory trend will be investigated in further testing). For all other modules, no memory leaks are suspected.
- CPU for all modules did not exceed 56% for all of the tests. Compared to Nolana mod-data-export-worker has no spikes anymore and the average CPU utilization of other modules is approximately the same, except nginx-okapi - which is about 15% higher.
- For all records number (100, 1k,10k), and 5 concurrent jobs - RDS CPU utilization did not exceed 41%. Better compared to Nolana(it was up to 50%).
Recommendations & Jiras
More than 50% of jobs with 10k + records FAILED in about 30 min - 1 hour with the error "Connection reset (SocketException)". - PERF-334Getting issue details... STATUS
Results
Test Runs
1# One (concurrent) Job
Number of records | Orchid (Total Time) | Nolana (Total Time) |
---|---|---|
100 | 1 min 9 s | 50 sec |
1000 | 2 min 54 s | 2 min 10 sec |
10k | 19 min 33 s | 12 min 25 sec |
100k | 2 hours 17 min |
2# Holdings App 10k records 1, 4, and 5 concurrent jobs
10k records for each job | |
Number of concurrent jobs | Duration |
---|---|
1 | 19 min 33 s |
4 | 20 min 40 s |
5 | 22 min 2 s |
3# Editing four jobs consecutively with 10k holdings records each
Job # | Job duration |
---|---|
1 | 19 min 23 s |
2 | 19 min 53 s |
3 | 19 min 47 s |
4 | 20 min 5 s |
4# 5 Concurrent Holdings Apps jobs
"BARCODE". Records number per 1 user | Orchid (Total Time) | Nolana (Total Time) |
---|---|---|
100 | 1 min 9 s | 49 sec |
1000 | 3 min 4 s | 2 min 25 sec |
10k | 22 min 2 s | 12 min 25 sec |
100k | - |
* "-" test was not performed due to errors that occurred
Memory usage
For all test runs
The memory utilization of mod-bulk operation increases from 20% to 23% (The service was updated before the test, probably it is reaching a steady state- the memory trend will be investigated in further testing).
The memory of mod-data-export worker was 58 % at the beginning of the tests and grows after the tests finished then goes back to normal and was 55% at
the end of testing (Looks like the work of the Garbage collectors).
Instance CPU utilization
Did not exceed 17%.
Service CPU utilization
CPU for all modules did not exceed 56% for all of the tests.
RDS CPU utilization
Maximum RDS CPU utilization is 41% for 5 concurrent jobs with 10k holdings records.
The more concurrent jobs are running -the higher RDS CPU usage. The maximum number of concurrent jobs will be investigated.
Appendix
Infrastructure
PTF -environment ncp5 [ environment name]
- 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1 [Number of ECS instances, instance type, location region]
- 2 instances of db.r6.xlarge database instances: Writer & reader instances
- MSK ptf-kakfa-3 [ kafka configurations]
- 4 kafka.m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Modules memory and CPU parameters:
Module | SoftLimit | XMX | Revision | Version | desiredCount | CPUUnits | RWSplitEnabled | HardLimit | Metaspace | MaxMetaspaceSize |
---|---|---|---|---|---|---|---|---|---|---|
mod-inventory-storage-b | 1952 | 1440 | 3 | mod-inventory-storage:26.1.0-SNAPSHOT.644 | 2 | 1024 | False | 2208 | 384 | 512 |
mod-inventory-b | 2592 | 1814 | 7 | mod-inventory:20.0.0-SNAPSHOT.392 | 2 | 1024 | False | 2880 | 384 | 512 |
okapi-b | 1440 | 922 | 1 | okapi:5.1.0-SNAPSHOT.1352 | 3 | 1024 | False | 1684 | 384 | 512 |
mod-users-b | 896 | 768 | 4 | mod-users:19.2.0-SNAPSHOT.584 | 2 | 128 | False | 1024 | 88 | 128 |
mod-data-export-worker | 2600 | 2048 | 3 | mod-data-export-worker:3.0.0-SNAPSHOT.104 | 2 | 1024 | False | 3072 | 384 | 512 |
mod-data-export-spring | 1844 | 1292 | 3 | mod-data-export-spring:2.0.0-SNAPSHOT.67 | 1 | 256 | False | 2048 | 200 | 256 |
mod-bulk-operations | 3864 | 0 | 10 | mod-bulk-operations:1.0.2 | 2 | 400 | False | 4096 | 384 | 512 |
mod-notes | 896 | 322 | 3 | mod-notes:5.1.0-SNAPSHOT.245 | 2 | 128 | False | 1024 | 128 | 128 |
mod-agreements | 2580 | 2048 | 3 | mod-agreements:5.6.0-SNAPSHOT.117 | 2 | 128 | False | 3096 | 384 | 512 |
nginx-okapi | 896 | 0 | 3 | nginx-okapi:2022.03.02 | 2 | 128 | False | 1024 | 0 | 0 |