Overview

This is a report for a series of Check-in-check-out test runs against the Kiwi release.

Back End:

mod-circulation-22.1.0
mod-circulation-storage-13.1.0
mod-inventory-18.0.0
mod-inventory-storage-22.0.0
mod-authtoken-2.9.0
mod-pubsub-2.4.0
mod-patron-blocks-1.4.0
mod-feesfines-17.0.0
okapi-4.9.0

Front End:

folio_circulation-6.0.0
Item Check-in (folio_checkin-6.0.0)
Item Check-out (folio_checkout-7.0.0)

Infrastructure:

71 back-end modules deployed in 141 ECS tasks
3 okapi ECS services
6 m5.xlarge EC2 instances
2 db.r6g.xlarge AWS RDS instance
INFO logging level

High Level Summary

Kiwi performs much better than Juniper GA, response times for check-ins are <900ms, checkout <1500ms, with not much variation between 1 and 20 users
Database performance is better and uses much less CPU compared to Juniper GA
Worst-performing APIs are still POST /checkin-by-barcode and POST /checkout-by-barcode. The response time are still about 500ms. GET /circulation/loans also takes more than 200ms. GET /inventory/item (by barcode) takes less than 100ms now.
Longevity test shows response times worsen over time, probably due to the growing DB CPU utilization.
Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.
Potentially could address this situation

Test Runs

Test	Virtual Users	Duration
1.	1	30 mins
2.	5	30 mins
3.	8	30 mins
4.	20	30 mins
5.	1 (repeat)	30 mins

Results

Response Times (Average of all tests listed above, in seconds)

	Average (seconds)		50th %tile (seconds)		75th %tile (seconds)		95th %tile (seconds)
	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out
1 user	0.838	1.582	0.767	1.464	0.915	1.716	0.915	1.716
5 users	0.73	1.376	0.676	1.272	0.769	1.448	1.052	2.061
8 users	0.758	1.392	0.674	1.228	0.839	1.44	1.202	2.343
20 users	0.899	1.506	0.731	1.303	0.931	1.535	1.531	2.504

Response times are consistently good, sub-second for check-ins and around 1.5 seconds for check-outs even in the 75th percentile. Not much variations between 1 and 20 users.

Comparisons to Last Release

The following tables compare Juniper HF3 test results against Kiwi General release (deployed at Bugfest).

Response Time Comparison

In general there is no regression in performance. The response times between Kiwi and Juniper are very close to each other for 1-8 users load unless they were in the 95th percentile group or the 20 users load where Kiwi clearly out-perform Juniper. In the tables below, the Delta columns express the differences between Juniper and Kiwi releases in percentage. Any percentage +/-5% is not statistically is within the margin of error. It is also noteworthy that Kiwi seems to invoke the GET /automated-patron-blocks 3 times instead of once Unable to locate Jira server for this macro. It may be due to Application Link configuration. . This call averages 25ms under all loads, so if 2 of these 3 calls were not needed (why would the UI calls it thrice?) then the Kiwi's Checkout average response times could improve by another 50ms.

	Average						50th percentile
	Check-in JP	Check-in KW	Delta	Check-out JP	Check-out KW	Delta	Check-in JP	Check-in KW	Delta	Check-out JP	Check-out KW	Delta
1 user	0.944	0.838	11.23%	1.579	1.582	-0.19%	0.835	0.767	8.14%	1.411	1.464	-3.76%
5 users	0.811	0.73	9.99%	1.359	1.376	-1.25%	0.750	0.676	9.87%	1.23	1.272	-3.41%
8 users	0.889	0.758	14.74%	1.425	1.392	2.32%	0.785	0.674	14.14%	1.262	1.228	2.69%
20 users	1.386	0.899	35.14%	2.21	1.506	31.86%	1.172	0.731	37.63%	1.887	1.303	30.95%

	75th percentile						95th percentile
	Check-in JP	Check-in KW	Delta	Check-out JP	Check-out KW	Delta	Check-in JP	Check-in KW	Delta	Check-out JP	Check-out KW	Delta
1 user	0.987	0.915	7.29%	1.604	1.716	-6.98%	1.533	1.214	20.81%	2.63	2.486	5.48%
5 users	0.867	0.769	11.30%	1.438	1.448	-0.70%	1.179	1.052	10.77%	2.111	1.448	31.41%
8 users	0.958	0.839	12.42%	1.481	1.44	2.77%	1.434	1.202	16.18%	2.343	2.343	0.00%
20 users	1.589	0.931	41.41%	2.5	1.535	38.60%	2.755	1.531	44.43%	4.369	2.504	42.69%

If compared Kiwi response times against the Juniper results documented in Check-in-check-out Test Report (Juniper), then Kiwi's times clearly are better than Juniper's. This is because the numbers in those tests were from the Juniper General release, when the GET /automated-patron-blocks were not optimized. HF3's modules are closer to Kiwi and therefore performs a lot better than Juniper GA.

"Worst" API Comparisons

The APIs identified in the table below were the ones that took over 100ms to execute in Juniper. In Kiwi these APIs perform much better, especially under greater load (8 and 20 users) where, for example, POST /checkout-by-barcode and POST /check-in-by-barcode's time are almost 50% better. GET /inventory/items also improved greatly due to implementation of the unique barcode index in the mod_inventory_storage.item table.

Average Response Time in milliseconds. Note: JP = Juniper build, KW = Juniper build

API	1 user JP (75th %tile)	1 user KW (75th %tile)	5 users JP (75th %tile)	5 users KW (75th %tile)	8 users JP (75th %tile)	8 users KW (75th %tile)	20 users JP (75th %tile)	20 users KW (75th %tile)
POST checkout-by-barcode	526	550	476	436	479	428	861	461
POST checkin-by-barcode	457	444	402	368	489	438	852	504
GET circulation/loans	246	305	245	262	272	267	516	303
GET inventory/items	171	124	166	96	170	92	250	99

Longevity Test

Longevity test shows that the response time increased as time went on.

	Check In	Check Out
1st Hour	0.821s	1.476s
16th Hour	1.129s	3.343s
25th Hour	0.999s	3.758s

In the response time graph below the Checkout Controller time, which gathers all chekc-out API response times), increased over a 16-hours window, from <2s to over 5s.

The DB CPU utilization percentage also increased over time

It's unclear what (DB connections) caused the DB to use more CPU resources as the test progressed, but looking at the logs there is one query that stood out as being slow:

SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan 
WHERE (lower(fs09000000_mod_circulation_storage.f_unaccent(loan.jsonb->>'userId')) LIKE lower(fs09000000_mod_circulation_storage.f_unaccent('<user UUID>'))) 
AND (lower(fs09000000_mod_circulation_storage.f_unaccent(loan.jsonb->'status'->>'name')) NOT LIKE lower(fs09000000_mod_circulation_storage.f_unaccent('Closed')))

This query at first took a little more than 100ms to run, but toward the end of the test it got to over 500ms on average and spiked to over a second.

Doing an EXPLAIN ANALYZE on the query shows that it was not using indexes in the query plan despite having the indexes to look up loans by userId and loan status name.

Gather (cost=1000.00..60093.80 rows=780 width=966) (actual time=192.948..200.814 rows=1 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Parallel Seq Scan on loan (cost=0.00..59015.80 rows=325 width=966) (actual time=145.247..191.854 rows=0 loops=3)
 Filter: ((lower(fs09000000_mod_circulation_storage.f_unaccent((jsonb ->> 'userId'::text))) ~~ '664e4002-9eb2-4c9e-addd-04f308a8062c'::text) AND (lower(fs09000000_mod_circulation_storage.f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))) !~~ 'closed'::text))
 Rows Removed by Filter: 53759
Planning Time: 0.212 ms
Execution Time: 200.834 ms

Due to the left() function (more details in Unable to locate Jira server for this macro. It may be due to Application Link configuration. ), the indexes were not applied. After removing the left() function, the indexes were applied and when running another longevity test, this time the DB CPU utilization was constant and under 20%, as seen in the diagram below.

Before and after fixing the left() function, there were spikes at half an hour interval, but these were due to background jobs processing the loans (age-to-lost, patron notices, etc..). As more loans got created during the test, the spikes got higher.

Here is a view of the CPU utilization. A couple of observations:

mod-users seems to take up CPU resources in a cyclical way.
Okapi uses about 450% CPU on average.

Here is the Service CPU Utilization graph without Okapi for better clarity of other modules' behaviors.

(Note that at around 8:00 (on the graph) there are extended spikes for all the modules, Okapi, and the DB. This is due to tooling. The tool for some reasons added another 20 concurrent users to the test. The results for this period had been discarded.)

There does not appear to be any memory leaks issues in Kiwi

Only mod-inventory-storage seems to have the spikes but but they were not prolonged and the processes did not crash.

Modules CPUs and Memory Utilization

The relevant services overall seem to occupy CPU resources nominally.

20-users tests	Avg	Max
mod-users	41%	47%
mod-pubsub	50%	63%
okapi	400%	439%
mod-circulation	50%	55%
mod-circulation-storage	33%	43%
mod-inventory	45%	50%
mod-inventory-storage	98%	162%
mod-patron-blocks	13%	17%
mod-feesfines	23%	26%

In the 8-users tests, mod-inventory spiked to almost 400% while averaged around 175%. This is an anomaly that can be disregarded. We have seen mod-inventory spiked while being idle or during a test run of any workflow before, mostly due to DI events_cache topic processing. In other tests this is not a problem, so it does not seem to be a bad trend here.

Kiwi Okapi's CPU % for 20 users is rather high compared to the results documented in Juniper test run, in which it was averaging about 325%, spiking to around 380%. However, in the 5 and 8 users tests, Juniper Okapi was using more CPU.

Services' memory seem to be stable during the test runs, although these short test runs do not veal much. mod-inventory-storage seems to be hogging memory as it goes along, but during these short tests this behavior did not have any effect on the results.

	Avg	Max
mod-users	60%	60%
mod-pubsub	49%	52%
okapi	98%	103%
mod-circulation	54%	58%
mod-circulation-storage	47%	53%
mod-inventory	26%	31%
mod-inventory-storage	68%, 97%, 115%	115%
mod-patron-blocks	87%	90%
mod-feesfines	87%	88%

Database

The freeable memory metric shows how much memory is being consumed and remaining during a test run. In the graph below memory usage is pretty good and stable throughout the test runs compared to Juniper GA that quickly drained memory due to client connections not closing.

Users Tested	MB consumed
1	800
5	1200
8	1300
20	1300

Database aveage CPU Utilization also improves and steady as well, compared to Juniper GA.

Users Tested	Kiwi %	Juniper GA %
1	9	15
5	13	50
8	14 (with a 78% spike)	78
20	24	100

Database connections averages:

Users Tested	Kiwi
1	478
5	580
8	580
20	608

Miscellaneous

Raw test data:

Check-in-check-out Test Report (Kiwi)