Check-in-check-out Test Report (Juniper)
Overview
This is a report for a series of Check-in-check-out test runs against the Juniper release.
Backend:
mod-circulation-22.0.0
mod-circulation-storage-13.0.0
mod-inventory-17.0.1
mod-inventory-storage-21.0.0
mod-authtoken-2.8.0
mod-pubsub-2.3.1
mod-patron-blocks-1.3.0
mod-feesfines-16.1.0
okapi-4.8.2
Frontend:
folio_circulation-5.1.0
Item Check-in (folio_checkin-5.1.0)
Item Check-out (folio_checkout-6.1.0)
Environment:
72 back-end modules deployed in 136 ECS services
3 okapi ECS services
12 m5.large EC2 instances
2 db.r5.xlarge AWS RDS instance
INFO logging level
High Level Summary
GET automated-patron-blocks response time gets worse even with 1 user (46%) and exponentially worse with 5 users (53%), 8 users (78%), 20 users (259%)
MODPATBLK-94: GET /automated-patron-blocks: Slow performance in JuniperClosed
Database quickly runs out of memory, 2GB per 2 hours in a 8 users test, could be due to mod-patron-block queries.
RMB-863: Free streamGet PreparedStatement fixing Aggressive DB Memory ConsumptionClosed
Check-in time improved compared to Iris.
See below for API calls consuming high CPU utilization percentage.
mod-inventory-storage appears to have a slow gradual memory leak
MODINVSTOR-751: Memory Leaks: io.vertx.core.impl.DuplicatedContextClosed
Test Runs
Test | Virtual Users | Duration |
1. | 1 | 30 mins |
2. | 5 | 30 mins |
3. | 8 | 30 mins |
4. | 20 | 30 mins |
5. | 1 | 30 mins |
6. | 5 | 30 mins |
7. | 8 | 30 mins |
8. | 20 | 30 mins |
9. | 1 | 30 mins |
10. | 5 | 30 mins |
11. | 8 | 30 mins |
12. | 20 | 30 mins |
13. | 20 | 1 day |
Results
Response Times (Average of all tests listed above, in seconds)
Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | |||||
Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | |
1 user | 0.94 | 2.582 | 0.85 | 2.40 | 1.106 | 2.75 | 1.42 | 3.83 |
5 users | 1.00 | 2.98 | 0.91 | 2.72 | 1.13 | 3.36 | 1.68 | 4.9 |
8 users | 1.43 | 3.93 | 1.29 | 3.62 | 1.67 | 4.55 | 2.54 | 6.69 |
20 users | 4.75 | 13.35 | 4.63 | 12.53 | 5.95 | 16.50 | 8.21 | 23.67 |
The following table shows the slow 75th percentile APIs taking more than 100 ms to return, and also comparing them against Goldenrod's. Other than the 1-user test, starting with the 5-users test, all listed APIs are slower with GET automated-patron-blocks leading the way at 150% slower, while GET circulation/loans regressed up to 60%.
Average Response Time in Seconds. Note: IR = Iris build, JP = Juniper build.
API | 1 user IR (75th %tile) | 1 user JP (75th %tile) | 5 users IR (75th %tile) | 5 users JP (75th %tile) | 8 users IR (75th %tile) | 8 users JP (75th %tile) | 20 users IR (75th %tile) | 20 users JP (75th %tile) |
|---|---|---|---|---|---|---|---|---|
POST checkout-by-barcode | 1.08 | 1.08 | 1.43 | 1.37 | 1.80 | 1.74 | 3.59 | 5.62 |
POST checkin-by-barcode | 0.68s | 0.58 | 0.98 | 0.63 | 1.41 | 0.93 | 2.89 | 1.83 |
GET automated-patron-blocks | 0.28 | 0.41 | 0.40 | 0.61 | 0.50 | 0.89 | 1.06 | 3.81 |
GET circulation/loans | 0.36 | 0.35 | 0.60 | 0.52 | 0.78 | 0.62 | 1.57 | 1.14 |
GET inventory/items | 0.35 | 0.24 | 0.44 | 0.29 | 0.55 | 0.42 | 1.13 | 3.97 |
POST checkout-by-barcode response time IMPROVES up to 8 users, more than 8 users (i.e., 20 users it's a lot worse)
POST checkin-by-barcode response time IMPROVES throughout
GET automated-patron-blocks response time gets worse even with 1 user (46%) and exponentially worse with 5 users (53%), 8 users (78%), 20 users (259%)
GET circulation/loans response time IMPROVES throughout
GET inventory/items response time IMPROVES up to 8 users, more than 8 users (i.e., 20 users it's a lot worse)
Average | 50th Percentile | 75th percentile | 95th percentile | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Check-in IR | Check-in JP | Check-out IR | Check-out JP | Check-in IR | Check-in JP | Check-out IR | Check-in JP | Check-in IR | Check-in JP | Check-out IR | Check-out JP | Check-in IR | Check-in JP | Check-out IR | Check-out JP | |
1 user | 1.16 | 0.93 | 2.56 | 2.58 | 1.02 | 0.85 | 2.35 | 2.40 | 1.17 | 1.02 | 2.64 | 2.76 | 1.83 | 1.42 | 3.78 | 3.83 |
5 users | 1.49 | 1.00 | 3.42 | 2.98 | 1.27 | 0.91 | 284 | 2.72 | 1.65 | 1.13 | 3.55 | 3.35 | 2.91 | 1.69 | 5.89 | 4.9 |
8 users | 1.91 | 1.43 | 3.90 | 3.93 | 1.63 | 1.29 | 3.44 | 3.62 | 2.28 | 1.68 | 4.48 | 4.55 | 3.85 | 2.54 | 7.01 | 6.69 |
20 users | 3.73 | 4.75 | 7.36 | 13.33 | 3.34 | 4.63 | 6.79 | 12.53 | 4.60 | 5.95 | 8.88 | 16.50 | 7.10 | 8.20 | 13.06 | 23.68 |
Check-in's response time improves up to 8 users by as much as 25%.
Check-out's response time is about the same as in Iris up to 8 users.
With 20 users, check-in's response time grew to 27% worse and check-out response's time went to 81% worse from 1% worse (in 8 users).
More tests were conducted to see the overall trend. Response times of 14 and 25 users tests also confirm a steep rise in check-out response time of 20 users test.
Response time graph shows a sharp rise in check-out response time (in seconds) when there are more than 8 users (i.e., 14, 20, 25 users).
The Delta % vs. Iris shows the difference in response time compared to Iris. Both check-in/out are doing pretty well compared to Iris until after 8 users when the difference percentage took off to nearly 20% for check-in and 100% for checkout when there are 20 users.
Longevity test
A 24-hour longevity test was performed with 20 concurrent users. Key observations:
Catastrophic failure after 14.5 hours. All API calls ceased to respond.
The catastrophic failure can be explained by the database struggling to keep up with the 20 users load when both the DB CPU utilization rate is off the chart and the DB's freeable memory was quickly exhausted after 2+ hours of testing. For more impact on the database, see the DB analysis below. Consequently, when the DB lagging behind, the modules started to time out and instability such as stale/dead connections began to happen.
As in Iris, Okapi started out using high level of CPU but gradually used less. Meanwhile mod-inventory-storage slowly uses more CPU as the test progressed. Other modules' CPU utilization are nominal.
Perhaps mod-inventory-storage's increase CPU is also due to an increase in memory usage. This indicates a memory leaks. Heap dump shows a leak of io.vertx.core.impl.DuplicatedContext. See MODINVSTOR-751: Memory Leaks: io.vertx.core.impl.DuplicatedContextClosed. Other modules' did not exhibit similar traits of leakage.
Modules CPUs and Memory
The graph below shows tests comparing CPU utilizations of modules involved in the check-in/out workflow among 6 short 30-minutes tests.
Notable observations:
Okapi's CPU utilization is rather high starting with 5 and 8 users. With 5 concurrent users Okapi already uses nearly 200% of CPU, 8 users almost 300%.
mod-patron-block is the second worst offender, consuming a high level of CPU.
Other modules are nominal in their CPU utilization.
The graph below shows the memory usage of check-in/out modules in the short 30-minutes tests. No memory leaks observed in these short tests.
Database
Juniper seems to be using memory quite aggressively for 8 users tests and certainly 20 user tests. Below is the freeable memory graph comparing the 1, 5, 8, and 20 users 30-minutes tests. When each test runs there is a steep drop off follows by a gradual decline of memory consumption. When the test is over the memory is returned to the pre-test level of 7GB. What to note here is that in 30 minutes, even a 5-users test consumes more than 1GB of memory, 8 users nearly 2GB, and 20 users is 2GB.