IN-PROGRESS
Overview
Backend:
- mod-circulation-19.2.7
- mod-circulation-storage-12.1.4
- mod-inventory-16.1.3
- mod-inventory-storage-19.4.4
- mod-authtoken-2.6.0
- mod-pubsub-1.3.3
- okapi-4.3.3 (also with 4.2.2)
Frontend:
- folio_circulation-4.0.1
- Item Check-in (folio_checkin-4.0.1)
- Item Check-out (folio_checkout-5.0.1)
Environment:
- 61 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 2 db.r5.xlarge AWS RDS instance
- INFO logging level
High Level Summary
Test Runs
Test | Virtual Users | Duration | OKAPI log level |
1. | 1 | 30 mins | INFO |
2. | 5 | 30 mins | INFO |
3. | 8 | 30 mins | INFO |
4. | 20 | 30 mins | INFO |
5. | 20 | 24 Hours | INFO |
Results
Response Times
Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | |||||
Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | |
1 user | 0.967 | 1.989 | 0.889 | 1.832 | 0.984 | 2.201 | 1.254 | 2.815 |
5 users | 1.053 | 2.171 | 0.981 | 1.969 | 1.114 | 2.253 | 1.528 | 3.370 |
8 users | 1.193 | 2.244 | 1.076 | 2.022 | 1.339 | 2.372 | 1.895 | 3.544 |
20 users | 2.391 | 3.901 | 1.639 | 3.073 | 2.263 | 4.12 | 4.811 | 8.784 |
The following table shows the slow 75th percentile APIs taking more than 100 ms to return, and also comparing them against Goldenrod's. Other than the 1-user test, starting with the 5-users test, all listed APIs are slower with GET automated-patron-blocks leading the way at 150% slower, while GET circulation/loans regressed up to 60%
API | 1 user GR (75th %tile) | 1 user HS (75th %tile) | 5 users GR (75th %tile) | 5 users HS (75th %tile) | 8 users GR (75th %tile) | 8 users HS (75th %tile) | 20 users GR (75th %tile) | 20 users HS (75th %tile) |
---|---|---|---|---|---|---|---|---|
GET circulation/loans | 0.345 | 0.349 | 0.365 | 0.406 | 0.391 | 0.422 | 0.654 | 0.784 |
GET inventory/items | 0.208 | 0.186 | 0.208 | 0.222 | 0.225 | 0.233 | 0.312 | 0.375 |
POST checkin-by-barcode | 0.682 | 0.593 | 0.631 | 0.664 | 0.815 | 0.849 | 1.296 | 1.467 |
POST checkout-by-barcode | 0.750 | 0.717 | 0.688 | 0.784 | 0.733 | 0.817 | 1.205 | 1.469 |
GET automated-patron-blocks | 0.069 | 0.163 | 0.085 | 0.180 | 0.079 | 0.188 | 0.118 | 0.296 |
Average | 50th Percentile | 75th percentile | 95th percentile | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Check-in GR | Check-in HS | Check-out GR | Check-out HS | Check-in GR | Check-in HS | Check-out GR | Check-out HS | Check-in GR | Check-in HS | Check-out GR | Check-out HS | Check-in GR | Check-in HS | Check-out GR | Check-out HS | |
1 user | 1.06 | 0.967 | 1.994 | 1.989 | 0.977 | 0.889 | 1.82 | 1.832 | 1.111 | 0.984 | 2.004 | 2.021 | 1.323 | 1.254 | 2.811 | 2.815 |
5 users | 1.011 | 1.053 | 1.924 | 2.171 | 0.975 | 0.981 | 1.814 | 1.969 | 1.067 | 1.114 | 1.977 | 2.253 | 1.251 | 1.528 | 2.625 | 3.37 |
8 users | 1.142 | 1.193 | 2.044 | 2.244 | 1.061 | 1.076 | 1.899 | 2.022 | 1.274 | 1.339 | 2.093 | 2.372 | 1.569 | 1.895 | 3.107 | 3.544 |
20 users | 1.702 | 2.391 | 3.02 | 3.901 | 1.49 | 1.639 | 2.652 | 3.073 | 1.936 | 2.263 | 3.273 | 4.12 | 2.953 | 4.811 | 5.352 | 8.784 |
This table shows that the average response time of the checkout transaction is 9% to 28% slower than in Goldenrod, and check-in in Honeysuckle likewise is about 4-22% slower (data aggregated from 5 to 20 users test runs).
Longevity test
A 24-hour longevity test was performed (with Okapi 4.2.2). Key observations:
- Performance degraded after 2 hours.
- Average response time in the first two hours:
- Check-in: 2.28 sec
- Check-out: 4.76 sec
- Average response time in the first 12 hours:
- Check-in: 2.709 sec
- Check-out: 7.898 sec
- Average response time in the last 12 hours:
- Check-in: 4.605 sec
- Check-out: 17.297 sec
- Together the average response time for the entire test run:
- Check-in: 3.414 sec
- Check-out: 11.419 sec
- Average response time in the first two hours:
- Throughput degrades over time, unsurprisingly
- Modules Services CPU utilizations (The black lines indicate the start and finish time of the test run)
- With the load of 20 users, Okapi started out working hardest, but its CPU utilization drops down over time. This is because mod-pubsub's is leaking HTTP Client and taking up more resources therefore slowing down, causing Okapi to not need to route messages as much. Clearly mod-pubsub's CPU utilization went up. The other modules follow Okapi's pattern.
- Services Modules Memory utilizations
- No modules exhibited memory leaks except for mod-pubsub
- Although there were two instances of mod-pubsub running on two different ec2 instances, mod-pubsub's traffic seemed to have been stickied to one instance. Here are graphs showing mod-pubsub's on one instance using up memory and CPU resources, and on another instance not showing much activities:
- mod-pubsub and Okapi on another node - Okapi's CPU utilization dwindles while mod-pubsub does not seem to be busy at all
JVM Profiling
Database
The following queries took more than 100ms to execute under load
The following table showing average CPU and memory percentages.CPU and Memory Utilization
1 user | 5 users | 8 users | 20 users | 400 users | |
---|---|---|---|---|---|
Okapi CPU | |||||
Okapi Memory |
Appendix
https://issues.folio.org/browse/MODPATBLK-70
https://issues.folio.org/browse/OKAPI-964
https://issues.folio.org/browse/OKAPI-965