Check-in-check-out Test Report (Honeysuckle)
Overview
This is a report for a series of Check-in-check-out test runs against the Honeysuckle release. PERF-135: Rerun Check-in-Check-out workflow test (Honeysuckle)Closed
Backend:
mod-circulation-19.2.7
mod-circulation-storage-12.1.4
mod-inventory-16.1.3
mod-inventory-storage-19.4.4
mod-authtoken-2.6.0
mod-pubsub-1.3.3
okapi-4.3.3 (also with 4.2.2)
Frontend:
folio_circulation-4.0.1
Item Check-in (folio_checkin-4.0.1)
Item Check-out (folio_checkout-5.0.1)
Environment:
61 back-end modules deployed in 110 ECS services
3 okapi ECS services
8 m5.large EC2 instances
2 db.r5.xlarge AWS RDS instance
INFO logging level
High Level Summary
Check-out: Honeysuckle is slower by 9%-28% than Goldenrod
Check-in: 4%-22% slower than Goldenrod
APIs turned slower in Honeysuckle: GET /automated-patron-blocks/{id} (150% slower) and GET /circulation/loans (60%). These are covered by MODPATBLK-70 and CIRC-1014, respectively
Okapi v4.3.3 seem to be using 2x-3x CPU cycles than in v1.3.2 (Goldenrod). Potential issue found with the logging methods OKAPI-964
mod-pubsub has a memory leak that would drag down performance under high loads (see section on longevity test): MODPUBSUB-136
Caching Okapi tokens in Okapi reduced mod-authtoken's CPU usage by over 90%
Database's memory usage improved dramatically from Goldenrod's - little memory consumptions observed.
Test Runs
Test | Virtual Users | Duration | OKAPI log level |
1. | 1 | 30 mins | INFO |
2. | 5 | 30 mins | INFO |
3. | 8 | 30 mins | INFO |
4. | 20 | 30 mins | INFO |
5. | 20 | 24 Hours | INFO |
Results
Response Times
| Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | ||||
| Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | Check-in | Check-out |
1 user | 0.967 | 1.989 | 0.889 | 1.832 | 0.984 | 2.201 | 1.254 | 2.815 |
5 users | 1.053 | 2.171 | 0.981 | 1.969 | 1.114 | 2.253 | 1.528 | 3.370 |
8 users | 1.193 | 2.244 | 1.076 | 2.022 | 1.339 | 2.372 | 1.895 | 3.544 |
20 users | 2.391 | 3.901 | 1.639 | 3.073 | 2.263 | 4.12 | 4.811 | 8.784 |
The following table shows the slow 75th percentile APIs taking more than 100 ms to return, and also comparing them against Goldenrod's. Other than the 1-user test, starting with the 5-users test, all listed APIs are slower with GET automated-patron-blocks leading the way at 150% slower, while GET circulation/loans regressed up to 60%
Note: GR = Goldenrod build, HS = Honeysuckle build
API | 1 user GR (75th %tile) | 1 user HS (75th %tile) | 5 users GR (75th %tile) | 5 users HS (75th %tile) | 8 users GR (75th %tile) | 8 users HS (75th %tile) | 20 users GR (75th %tile) | 20 users HS (75th %tile) |
|---|---|---|---|---|---|---|---|---|
GET circulation/loans | 0.345 | 0.349 | 0.365 | 0.406 | 0.075 | 0.122 | 0.654 | 0.784 |
GET inventory/items | 0.208 | 0.186 | 0.208 | 0.222 | 0.225 | 0.244 | 0.312 | 0.375 |
POST checkin-by-barcode | 0.682 | 0.593 | 0.631 | 0.664 | 0.815 | 0.874 | 1.296 | 1.467 |
POST checkout-by-barcode | 0.750 | 0.717 | 0.688 | 0.784 | 0.733 | 0.877 | 1.205 | 1.469 |
GET automated-patron-blocks | 0.069 | 0.163 | 0.085 | 0.180 | 0.079 | 0.197 | 0.118 | 0.296 |
| Average | 50th Percentile | 75th percentile | 95th percentile | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Check-in GR | Check-in HS | Check-out GR | Check-out HS | Check-in GR | Check-in HS | Check-out GR | Check-out HS | Check-in GR | Check-in HS | Check-out GR | Check-out HS | Check-in GR | Check-in HS | Check-out GR | Check-out HS | |
1 user | 1.06 | 0.967 | 1.994 | 1.989 | 0.977 | 0.889 | 1.82 | 1.832 | 1.111 | 0.984 | 2.004 | 2.021 | 1.323 | 1.254 | 2.811 | 2.815 |
5 users | 1.011 | 1.053 | 1.924 | 2.171 | 0.975 | 0.981 | 1.814 | 1.969 | 1.067 | 1.114 | 1.977 | 2.253 | 1.251 | 1.528 | 2.625 | 3.37 |
8 users | 1.142 | 1.193 | 2.044 | 2.244 | 1.061 | 1.076 | 1.899 | 2.022 | 1.274 | 1.339 | 2.093 | 2.372 | 1.569 | 1.895 | 3.107 | 3.544 |
20 users | 1.702 | 2.391 | 3.02 | 3.901 | 1.49 | 1.639 | 2.652 | 3.073 | 1.936 | 2.263 | 3.273 | 4.12 | 2.953 | 4.811 | 5.352 | 8.784 |
This table shows that the average response time of the checkout transaction is 9% to 28% slower than in Goldenrod, and check-in in Honeysuckle likewise is about 4-22% slower (data aggregated from 5 to 20 users test runs).
Update 1/14/2011
Subsequent investigations (
PERF-140: Investigate - GET /circulation/loans Regressed in HoneysuckleClosed AND
CIRC-1014: GET /circulation/loans Regressed in HoneysuckleClosed) on GET /circulation/loans do not show degradations by the API itself. We hypothesize that other API calls that were executed during the test run may have dragged down the response time, particularly if it was trying to read and write to the same rows in the database at the same time.
Longevity test
A 24-hour longevity test was performed with 20 concurrent users and running with Okapi 4.2.2. Key observations:
Performance degraded after 2 hours.
Average response time in the first two hours:
Check-in: 2.28 sec
Check-out: 4.76 sec
Average response time in the first 12 hours:
Check-in: 2.709 sec
Check-out: 7.898 sec
Average response time in the last 12 hours:
Check-in: 4.605 sec
Check-out: 17.297 sec
Together the average response time for the entire test run:
Check-in: 3.414 sec
Check-out: 11.419 sec
Throughput degrades over time, unsurprisingly
Modules Services CPU utilizations (The black lines indicate the start and finish time of the test run)
With the load of 20 users, Okapi started out working hardest, but its CPU utilization drops down over time. This is because mod-pubsub's is leaking HTTP Client and taking up more resources therefore slowing down, causing Okapi to not need to route messages as much. Clearly mod-pubsub's CPU utilization went up. The other modules follow Okapi's pattern.
Services Modules Memory utilizations
No modules exhibited memory leaks except for mod-pubsub
Although there were two instances of mod-pubsub running on two different ec2 instances, mod-pubsub's traffic seemed to have been stickied to one instance. Here are graphs showing mod-pubsub's on one instance using up memory and CPU resources, and on another instance not showing much activities:
mod-pubsub and Okapi on another node - Okapi's CPU utilization dwindles while mod-pubsub does not seem to be busy at all
CPUs and Memories
Okapi was profiled because of the apparent 3x CPU utilization compared to the Goldenrod runs.
Goldenrod test runs:
Honeysuckle test runs:
Clearly Okapi is using more CPU cycles in Honeysuckle than in Goldenrod, 1 user
Other relevant modules' CPU utilizations in Goldenrod
Same modules in Honeysuckle:
A few things to note:
mod-authtoken uses much less CPU in Honeysuckle, over 90% reduction across all tests! This is because of the token caching functionality that was added to Okapi 4.x
mod-circulation's CPU utilization in Honeysuckle averages over 20% lower than in Goldenrod.
mod-circulation's CPU utilization in Honeysuckle is about 10-30% higher than in Goldenrod
mod-inventory's CPU utilization in Honeysuckle averages 30% more than in Goldenrod
mod-inventory-storage's CPU utilization in Honeysuckle averages 20% more than in Goldenrod
mod-pubsub's CPU utilization in Honeysuckle is about 15% less than in Goldenrod
mod-patron-blocks CPU utilization in Honeysuckle is at least 30% less than in Goldenrod
JVM Profiling
Because Okapi's CPU utilization in Honeysuckle seemed to have averaged 2x to 3x higher than in Goldenrod, it was profiled to get more insights of what happened inside it.
The slowest methods in Honeysuckle are once again the Logger and Jackson serialization methods
Compared to Goldenrod:
Note that the AbstractLogger.Info method in Okapi 4.3.3 total CPU time is about 3x higher than in Goldenrod. This is confirmed by Okapi 4.3.3's metrics showing ProxyContext.logRequest and ProxyContext.logResponse methods' response times degrade over time. These two methods need to be investigated.