...
Table of Contents | ||
---|---|---|
|
Overview
This is a report for a series of Check-in-check-out test runs against the Honeysuckle release.
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
- 61 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 2 db.r5.xlarge AWS RDS instance
- INFO logging level
High Level Summary
- Check-out: Honeysuckle is slower by 9%-28% than Goldenrod
- Check-in: 4%-22% slower than Goldenrod
- APIs turned slower in Honeysuckle: GET /automated-patron-blocks/{id} (150% slower) and GET /circulation/loans (60%). These are covered by MODPATBLK-70 and CIRC-1014, respectively
- Okapi v4.3.3 seem to be using 2x-3x CPU cycles than in v1.3.2 (Goldenrod). Potential issue found with the logging methods OKAPI-964
- mod-pubsub has a memory leak that would drag down performance under high loads (see section on longevity test): MODPUBSUB-136
- Caching Okapi tokens in Okapi reduced mod-authtoken's CPU usage by over 90%
- Database's memory usage improved dramatically from Goldenrod's - little memory consumptions observed.
Test Runs
Test | Virtual Users | Duration | OKAPI log level |
1. | 1 | 30 mins | INFO |
2. | 5 | 30 mins | INFO |
3. | 8 | 30 mins | INFO |
4. | 20 | 30 mins | INFO |
5. | 20 | 24 Hours | INFO |
Results
Response Times
Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | |||||
Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | |
1 user | 0.967 | 1.989 | 0.889 | 1.832 | 0.984 | 2.201 | 1.254 | 2.815 |
5 users | 1.053 | 2.171 | 0.981 | 1.969 | 1.114 | 2.253 | 1.528 | 3.370 |
8 users | 1.193 | 2.244 | 1.076 | 2.022 | 1.339 | 2.372 | 1.895 | 3.544 |
20 users | 2.391 | 3.901 | 1.639 | 3.073 | 2.263 | 4.12 | 4.811 | 8.784 |
...
- Subsequent investigations (
ANDJira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key PERF-140
) on GET /circulation/loans do not show degradations by the API itself. We hypothesize that other API calls that were executed during the test run may have dragged down the response time, particularly if it was trying to read and write to the same rows in the database at the same time.Jira Legacy server System JiraJIRA serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key CIRC-1014
...
- Services Modules Memory utilizations
- No modules exhibited memory leaks except for mod-pubsub
- Although there were two instances of mod-pubsub running on two different ec2 instances, mod-pubsub's traffic seemed to have been stickied to one instance. Here are graphs showing mod-pubsub's on one instance using up memory and CPU resources, and on another instance not showing much activities:
- mod-pubsub and Okapi on another node - Okapi's CPU utilization dwindles while mod-pubsub does not seem to be busy at all
CPUs and Memories
Okapi was profiled because of the apparent 3x CPU utilization compared to the Goldenrod runs.
...
- mod-authtoken uses much less CPU in Honeysuckle, over 90% reduction across all tests! This is because of the token caching functionality that was added to Okapi 4.x
- mod-circulation's CPU utilization in Honeysuckle averages over 20% lower than in Goldenrod.
- mod-circulation's CPU utilization in Honeysuckle is about 10-30% higher than in Goldenrod
- mod-inventory's CPU utilization in Honeysuckle averages 30% more than in Goldenrod
- mod-inventory-storage's CPU utilization in Honeysuckle averages 20% more than in Goldenrod
- mod-pubsub's CPU utilization in Honeysuckle is about 15% less than in Goldenrod
- mod-patron-blocks CPU utilization in Honeysuckle is at least 30% less than in Goldenrod
JVM Profiling
Because Okapi's CPU utilization in Honeysuckle seemed to have averaged 2x to 3x higher than in Goldenrod, it was profiled to get more insights of what happened inside it.
...
Note that the AbstractLogger.Info method in Okapi 4.3.3 total CPU time is about 3x higher than in Goldenrod. This is confirmed by Okapi 4.3.3's metrics showing ProxyContext.logRequest and ProxyContext.logResponse methods' response times degrade over time. These two methods need to be investigated.
Database
The database CPU utilizations are about the same between the Honeysuckle and Goldenrod
...
Goldenrod's memory profile shows quick claims of memory over 30 minutes tests runs.
Missing Indexes
Honeysuckle tests revealed the following missing indexes:
...
Code Block |
---|
WARNING: Doing LIKE search without index for accounts.jsonb->>'userId', CQL >>> SQL: userId == e96618a9-04ee-4fea-aa60-306a8f4dd89b >>> lower(f_unaccent(accounts.jsonb->>'userId')) LIKE lower(f_unaccent('e96618a9-04ee-4fea-aa60-306a8f4dd89b')) WARNING: Doing LIKE search without index for accounts.jsonb->'status'>>'name', CQL >>> SQL: status.name <> Closed >>> lower(f_unaccent(accounts.jsonb>'status'->>'name')) NOT LIKE lower(f_unaccent('Closed')) WARNING: Doing LIKE search without index for manualblocks.jsonb->>'userId', CQL >>> SQL: userId == a79b533d-8f29-4be1-9415-5f5cd936623b >>> lower(f_unaccent(manualblocks.jsonb->>'userId')) LIKE lower(f_unaccent('a79b533d-8f29-4be1-9415-5f5cd936623b')) |
Results for okapi-4.5.2
Results for okapi-4.5.2 for 1,5,8,20 users for 30 minute run. From the response times below, the average Check-out for 20 users is slower. On average 60% slower than okapi-4.3.3.
...
Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | |||||
Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | |
1 user | 0.971 | 2.072 | 0.92 | 1.906 | 1.013 | 2.093 | 1.326 | 2.905 |
5 users | 1.003 (1.092 ) + | 2.114 (2.584 ) - | 0.925 (0.978 ) + | 1.947 (2.323 ) + | 1.055 (1.16 ) + | 2.235 (2.746 ) + | 1.458 (1.622 ) + | 3.149 (4.021 ) - |
8 users | 1.217 (1.429 ) - | 2.467 (3.057 ) - | 1.099 (1.285 ) - | 2.207 (2.747 ) - | 1.357 (1.62 ) - | 2.648 (33.354 ) - | 1.931 (2.415 ) - | 4.095 (5.079 ) - |
20 users | 2.409 (3.073 ) + | 5.213 (7.877 ) - | 2.141 (2.595 ) + | 4.478 (6.307 ) + | 2.763 (3.411 ) + | 5.682 (8.287 ) + | 4.233 (6.409 ) + | 8.484 (14.703 ) + |
CPUs and Memories
Service CPU Utilization:
CPU Utilization gradually increases as the number of users increase to 20 but this behavior is similar to okapi-4.3.3
...
Memory Utilization is a little high for mod-circulation 105% but for all other modules, it is relatively stable across all test runs for all users.
RDS CPU Utilization
RDS CPU Utilization is around 50% more compared to okapi-4.3.3
Comparison okapi-4.5.2 vs okapi-4.6.1
okapi-4.6.1 is slower than okapi-4.5.2. Checkin is 3.66% slower and Checkout is 9.48% slower. See below comparison for 8 Users 30-minute test run.
Results for okapi-4.6.1
From the response times below, okapi-4.6.1, checkin-checkout for 1 user is a little slower but for 5, 8, 20 users, checkin-checkout is much faster comparing to okapi-4.3.3.
...
Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | |||||||||
Check-in | Check-in performance with okapi-4.3.3 | Check-out | Check-out performance with okapi-4.3.3 | Check-in | Check-out | Check-in | Check-in performance with okapi-4.3.3 | Check-out | Check-out performance with okapi-4.3.3 | Check-in | Check-out | |
1 user | 1.041 | -9.7% | 2.332 | -7% | 0.957 | 2.139 | 1.06 | -8.5% | 2.369 | -8.10% | 1.378 | 3.394 |
5 users | 1.057 | +6.5% | 2.374 | +8.4% | 0.978 | 2.176 | 1.133 | +6.88% | 2.532 | +10.18% | 1.524 | 3.624 |
8 users | 1.277 | +2.8% | 2.814 | +4.7% | 1.144 | 2.512 | 1.44 | +3.2% | 3.074 | +6.50% | 2.112 | 4.718 |
20 users | 2.374 | +36.9% | 5.927 | +26.4% | 2.137 | 5.246 | 2.716 | +32.7 | 6.552 | +26.87 | 4.188 | 9.426 |
CPUs and Memories
Service CPU Utilization:
Compared to okapi-4.3.3, CPU Utilization for okapi-4.6.1 around the same.
...
Compared to okapi-4.3.3, Service Memory Utilization for okapi-4.6.1 around the same.
RDS CPU Utilization
RDS CPU Utilization is normal for 1, 5, and 8 Users. For 20 Users, the CPU is higher almost 95% but considering the large load, it is normal as well.
8 Hours Longevity test run for 20 Users
Service CPU Utilization:
Okapi-4.6.1 consumes less CPU gradually from 1st hour into 8th hour. However, at the same time, mod-pubsub consumes more CPU by gradually increasing from 50% to almost 160%
...
Memory usage for okapi-4.6.1 and other modules is relatively constant and stable throughout the run. mod-circulation memory consumption increases to 125% and then stabilizes.
Comparison okapi-4.3.3 vs okapi-4.6.1 (okapi metrics enabled)
Below results are for 8 Users, 30 minutes against Checkin-Checkout workflow.
Grafana Performance Dashboard
Okapi-4.6.1 is around 71% faster than Okapi-4.3.3. Okapi-4.6.1 can process more requests and still perform better. In 30 minutes test run, okapi-4.6.1 was able to process 25% more requests with an average request per second(RPS) is 40.
...
Okapi-4.6.1 Grafana performance dashboard:
Checkin-Checkout API level comparison
For Okapi-4.6.1, Check-in is 71% faster and Checkout is around 65% faster.
Log request/response comparison
For Okapi-4.6.1, Log request has improved from 3.16 second to 0.243 seconds. Log request is faster by 1200% faster. Log response has improved from 3.25 seconds to 0.266 seconds which is 1100% faster.
...
Okapi-4.6.1 log request/response comparison:
Service CPU Utilization
Okapi-4.6.1 consumes less CPU and hence more efficient. Average CPU Utilization for okapi-4.6.1 is around 380% vs okapi-4.3.3 which is 600%.
...
Okapi-4.6.1 Service CPU Utilization:
Appendix
https://issuesfolio-org.folioatlassian.orgnet/browse/MODPATBLK-70
https://issuesfolio-org.folioatlassian.orgnet/browse/OKAPI-964
https://issuesfolio-org.folioatlassian.orgnet/browse/OKAPI-965