Overview
This is a report for a series of Check-in-check-out test runs against the Kiwi release.
Back End:
- mod-circulation-22.1.0
- mod-circulation-storage-13.1.0
- mod-inventory-18.0.0
- mod-inventory-storage-22.0.0
- mod-authtoken-2.9.0
- mod-pubsub-2.4.0
- mod-patron-blocks-1.4.0
- mod-feesfines-17.0.0
- okapi-4.9.0
Front End:
- folio_circulation-6.0.0
- Item Check-in (folio_checkin-6.0.0)
- Item Check-out (folio_checkout-7.0.0)
Infrastructure:
- 71 back-end modules deployed in 141 ECS tasks
- 3 okapi ECS services
- 6 m5.xlarge EC2 instances
- 2 db.r6g.xlarge AWS RDS instance
- INFO logging level
High Level Summary
- Kiwi performs much better than Juniper GA, response times for check-ins are <900ms, checkout <1500ms, with not much variation between 1 and 20 users
- Database performance is better and uses much less CPU compared to Juniper GA
- Worst-performing APIs are still POST /checkin-by-barcode and POST /checkout-by-barcode. The response time are still about 500ms. GET /circulation/loans also takes more than 200ms. GET /inventory/item (by barcode) takes less than 100ms now.
- Longevity test shows response times worsen over time, probably due to the growing DB CPU utilization. - CIRCSTORE-304Getting issue details... STATUS Potentially could address this situation
Test Runs
Test | Virtual Users | Duration |
1. | 1 | 30 mins |
2. | 5 | 30 mins |
3. | 8 | 30 mins |
4. | 20 | 30 mins |
5. | 1 (repeat) | 30 mins |
Results
Response Times (Average of all tests listed above, in seconds)
Average (seconds) | 50th %tile (seconds) | 75th %tile (seconds) | 95th %tile (seconds) | |||||
Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | Check-in | Check-out | |
1 user | 0.838 | 1.582 | 0.767 | 1.464 | 0.915 | 1.716 | 0.915 | 1.716 |
5 users | 0.73 | 1.376 | 0.676 | 1.272 | 0.769 | 1.448 | 1.052 | 2.061 |
8 users | 0.758 | 1.392 | 0.674 | 1.228 | 0.839 | 1.44 | 1.202 | 2.343 |
20 users | 0.899 | 1.506 | 0.731 | 1.303 | 0.931 | 1.535 | 1.531 | 2.504 |
Response times are consistently good, sub-second for check-ins and around 1.5 seconds for check-outs even in the 75th percentile. Not much variations between 1 and 20 users.
Comparisons to Last Release
The following tables compare Juniper HF3 test results against Kiwi General release (deployed at Bugfest).
Response Time Comparison
In general there is no regression in performance. The response times between Kiwi and Juniper are very close to each other for 1-8 users load unless they were in the 95th percentile group or the 20 users load where Kiwi clearly out-perform Juniper. In the tables below, the Delta columns express the differences between Juniper and Kiwi releases in percentage. Any percentage +/-5% is not statistically is within the margin of error. It is also noteworthy that Kiwi seems to invoke the GET /automated-patron-blocks 3 times instead of once - UICHKOUT-755Getting issue details... STATUS . This call averages 25ms under all loads, so if 2 of these 3 calls were not needed (why would the UI calls it thrice?) then the Kiwi's Checkout average response times could improve by another 50ms.
Average | 50th percentile | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Check-in JP | Check-in KW | Delta | Check-out JP | Check-out KW | Delta | Check-in JP | Check-in KW | Delta | Check-out JP | Check-out KW | Delta | |
1 user | 0.944 | 0.838 | 11.23% | 1.579 | 1.582 | -0.19% | 0.835 | 0.767 | 8.14% | 1.411 | 1.464 | -3.76% |
5 users | 0.811 | 0.73 | 9.99% | 1.359 | 1.376 | -1.25% | 0.750 | 0.676 | 9.87% | 1.23 | 1.272 | -3.41% |
8 users | 0.889 | 0.758 | 14.74% | 1.425 | 1.392 | 2.32% | 0.785 | 0.674 | 14.14% | 1.262 | 1.228 | 2.69% |
20 users | 1.386 | 0.899 | 35.14% | 2.21 | 1.506 | 31.86% | 1.172 | 0.731 | 37.63% | 1.887 | 1.303 | 30.95% |
75th percentile | 95th percentile | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Check-in JP | Check-in KW | Delta | Check-out JP | Check-out KW | Delta | Check-in JP | Check-in KW | Delta | Check-out JP | Check-out KW | Delta | |
1 user | 0.987 | 0.915 | 7.29% | 1.604 | 1.716 | -6.98% | 1.533 | 1.214 | 20.81% | 2.63 | 2.486 | 5.48% |
5 users | 0.867 | 0.769 | 11.30% | 1.438 | 1.448 | -0.70% | 1.179 | 1.052 | 10.77% | 2.111 | 1.448 | 31.41% |
8 users | 0.958 | 0.839 | 12.42% | 1.481 | 1.44 | 2.77% | 1.434 | 1.202 | 16.18% | 2.343 | 2.343 | 0.00% |
20 users | 1.589 | 0.931 | 41.41% | 2.5 | 1.535 | 38.60% | 2.755 | 1.531 | 44.43% | 4.369 | 2.504 | 42.69% |
If compared Kiwi response times against the Juniper results documented in Check-in-check-out Test Report (Juniper), then Kiwi's times clearly are better than Juniper's. This is because the numbers in those tests were from the Juniper General release, when the GET /automated-patron-blocks were not optimized. HF3's modules are closer to Kiwi and therefore performs a lot better than Juniper GA.
"Worst" API Comparisons
The APIs identified in the table below were the ones that took over 100ms to execute in Juniper. In Kiwi these APIs perform much better, especially under greater load (8 and 20 users) where, for example, POST /checkout-by-barcode and POST /check-in-by-barcode's time are almost 50% better. GET /inventory/items also improved greatly due to implementation of the unique barcode index in the mod_inventory_storage.item table.
Average Response Time in milliseconds. Note: JP = Juniper build, KW = Juniper build
API | 1 user JP (75th %tile) | 1 user KW (75th %tile) | 5 users JP (75th %tile) | 5 users KW (75th %tile) | 8 users JP (75th %tile) | 8 users KW (75th %tile) | 20 users JP (75th %tile) | 20 users KW (75th %tile) |
---|---|---|---|---|---|---|---|---|
POST checkout-by-barcode | 526 | 550 | 476 | 436 | 479 | 428 | 861 | 461 |
POST checkin-by-barcode | 457 | 444 | 402 | 368 | 489 | 438 | 852 | 504 |
GET circulation/loans | 246 | 305 | 245 | 262 | 272 | 267 | 516 | 303 |
GET inventory/items | 171 | 124 | 166 | 96 | 170 | 92 | 250 | 99 |
Longevity Test
Longevity test shows that the response time increased as time went on.
Check In | Check Out | |
---|---|---|
1st Hour | 0.821s | 1.476s |
16th Hour | 1.129s | 3.343s |
25th Hour | 0.999s | 3.758s |
In the response time graph below the Checkout Controller time, which gathers all chekc-out API response times), increased over a 16-hours window, from <2s to over 5s.
The DB CPU utilization percentage also increased over time
It's unclear what (DB connections) caused the DB to use more CPU resources as the test progressed, but looking at the logs there is one query that stood out as being slow:
SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan WHERE (lower(fs09000000_mod_circulation_storage.f_unaccent(loan.jsonb->>'userId')) LIKE lower(fs09000000_mod_circulation_storage.f_unaccent('<user UUID>'))) AND (lower(fs09000000_mod_circulation_storage.f_unaccent(loan.jsonb->'status'->>'name')) NOT LIKE lower(fs09000000_mod_circulation_storage.f_unaccent('Closed')))
This query at first took a little more than 100ms to run, but toward the end of the test it got to over 500ms on average and spiked to over a second.
Doing an EXPLAIN ANALYZE on the query shows that it was not using indexes in the query plan despite having the indexes to look up loans by userId and loan status name.
Gather (cost=1000.00..60093.80 rows=780 width=966) (actual time=192.948..200.814 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on loan (cost=0.00..59015.80 rows=325 width=966) (actual time=145.247..191.854 rows=0 loops=3) Filter: ((lower(fs09000000_mod_circulation_storage.f_unaccent((jsonb ->> 'userId'::text))) ~~ '664e4002-9eb2-4c9e-addd-04f308a8062c'::text) AND (lower(fs09000000_mod_circulation_storage.f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) !~~ 'closed'::text)) Rows Removed by Filter: 53759 Planning Time: 0.212 ms Execution Time: 200.834 ms
Due to the left() function (more details in - CIRCSTORE-304Getting issue details... STATUS ), the indexes were not applied. After removing the left() function, the indexes were applied and when running another longevity test, this time the DB CPU utilization was constant and under 20%, as seen in the diagram below.
Before and after fixing the left() function, there were spikes at half an hour interval, but these were due to background jobs processing the loans (age-to-lost, patron notices, etc..). As more loans got created during the test, the spikes got higher.
Here is a view of the CPU utilization. A couple of observations:
- mod-users seems to take up CPU resources in a cyclical way.
- Okapi uses about 450% CPU on average.
Here is the Service CPU Utilization graph without Okapi for better clarity of other modules' behaviors.
(Note that at around 8:00 (on the graph) there are extended spikes for all the modules, Okapi, and the DB. This is due to tooling. The tool for some reasons added another 20 concurrent users to the test. The results for this period had been discarded.)
There does not appear to be any memory leaks issues in Kiwi
Only mod-inventory-storage seems to have the spikes but but they were not prolonged and the processes did not crash.
Modules CPUs and Memory Utilization
The relevant services overall seem to occupy CPU resources nominally.
20-users tests | Avg | Max |
---|---|---|
mod-users | 41% | 47% |
mod-pubsub | 50% | 63% |
okapi | 400% | 439% |
mod-circulation | 50% | 55% |
mod-circulation-storage | 33% | 43% |
mod-inventory | 45% | 50% |
mod-inventory-storage | 98% | 162% |
mod-patron-blocks | 13% | 17% |
mod-feesfines | 23% | 26% |
In the 8-users tests, mod-inventory spiked to almost 400% while averaged around 175%. This is an anomaly. We have seen mod-inventory spiked while being idle or during a test run of any workflow before, mostly due to DI events_cache topic processing. In other tests this is not a problem, so it does not seem to be a bad trend here.
Kiwi Okapi's CPU % for 20 users is rather high compared to the results documented in Juniper test run, in which it was averaging about 325%, spiking to around 380%. However, in the 5 and 8 users tests, Juniper Okapi was using more CPU.
Services' memory seem to be stable during the test runs, although these short test runs do not veal much. mod-inventory-storage seems to be hogging memory as it goes along, but during these short tests this behavior did not have any effect on the results.
Avg | Max | |
---|---|---|
mod-users | 60% | 60% |
mod-pubsub | 49% | 52% |
okapi | 98% | 103% |
mod-circulation | 54% | 58% |
mod-circulation-storage | 47% | 53% |
mod-inventory | 26% | 31% |
mod-inventory-storage | 68%, 97%, 115% | 115% |
mod-patron-blocks | 87% | 90% |
mod-feesfines | 87% | 88% |
Database
The freeable memory metric shows how much memory is being consumed and remaining during a test run. In the graph below memory usage is pretty good and stable throughout the test runs compared to Juniper GA that quickly drained memory due to client connections not closing.
Users Tested | MB consumed |
---|---|
1 | 800 |
5 | 1200 |
8 | 1300 |
20 | 1300 |
Database aveage CPU Utilization also improves and steady as well, compared to Juniper GA.
Users Tested | Kiwi % | Juniper GA % |
---|---|---|
1 | 9 | 15 |
5 | 13 | 50 |
8 | 14 (with a 78% spike) | 78 |
20 | 24 | 100 |
Database connections averages:
Users Tested | Kiwi |
---|---|
1 | 478 |
5 | 580 |
8 | 580 |
20 | 608 |
Miscellaneous
- Raw test data: