Overview

This is a report for a series of Check-in-check-out test runs against the Lotus release.

Back End:

mod-circulation-23.0.1
mod-circulation-storage-14.0.0
mod-inventory-18.1.4
mod-inventory-storage-23.0.2
mod-authtoken-2.9.1
mod-pubsub-2.5.0
mod-patron-blocks-1.5.0
mod-feesfines-17.1.0
okapi-4.13.0

Front End:

Item Check-in (folio_checkin-7.0.0)
Item Check-out (folio_checkout-8.0.0)

Infrastructure:

75 back-end modules deployed in 159 ECS tasks
3 okapi ECS services
9 m5.xlarge EC2 instances
2 db.r6g.xlarge AWS RDS instance
INFO logging level

High Level Summary

Lotus performs much better than Kiwi response times for check-ins are mid 600ms, checkout 900ms - 1000ms, with little variations between 1 and 20 users
Database performance is better and uses less CPU compared to Kiwi.
Worst-performing APIs are only POST /checkin-by-barcode and POST /checkout-by-barcode. They have improved since Kiwi, now averaging in the mid 300ms even in the 20 users scenario.
- GET /circulation/loans takes 185ms in the 20 users case.

Test Runs

Test	Virtual Users	Duration
1.	1 (with 1 mod-inventory and 1 mod-circulation task)	30 mins
2.	5 (with 1 mod-inventory and 1 mod-circulation task)	30 mins
3.	8 (with 1 mod-inventory and 1 mod-circulation task)	30 mins
4.	20 (with 1 mod-inventory and 1 mod-circulation task)	30 mins
5.	1 (repeat with 2 inventory and circ tasks)	30 mins
6.	5 (repeat with 2 inventory and circ tasks)	30 mins
7.	8 (repeat with 2 inventory and circ tasks)	30 mins
8.	20 (repeat with 2 inventory and circ tasks)	30 mins
9.	20 users longevity	24 hours

Results

Response Times (Average of all tests listed above, in seconds)

	Average (seconds)		50th %tile (seconds)		75th %tile (seconds)		95th %tile (seconds)
	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out
1 user	0.73	1.188	0.682	1.084	0.774	1.24	1	1.812
5 users	0.62	0.976	0.582	0.915	0.662	1.02	0.844	1.44
8 users	0.615	0.971	0.581	0.904	0.654	1.006	0.836	1.428
20 users	0.661	1.052	0.62	0.967	0.702	1.107	0.918	1.593

Response times are consistently good, sub-second for check-ins and a tad over 1 second for check-outs even in the 75th percentile. The variations between 1 and 20 users continue to get smaller (compared to previous releases)

Comparisons to Last Release

The following tables compare Kiwi test results against Lotus General release (deployed at Lotus Bugfest).

Response Time Comparison

In the tables below, the Delta columns express the differences between Kiwi and Lotus releases in percentage.

In general there are no regressions in performance but instead there are improvements in response times across the board. Check in times improved by at least 15% and check out times consistently around 30%. The 20 users' response times are also in very similar to 8 users, especially for checkouts. This means that the system is very stable from 1 to 20 users.

Note: LT = Lotus build, KW = Kiwi build

	Average						50th percentile
	Check-in KW	Check-in LT	Delta	Check-out KW	Check-out LT	Delta	Check-in KW	Check-in LT	Delta	Check-out KW	Check-out LT	Delta
1 user	0.838	0.73	12.89%	1.582	1.188	24.91%	0.767	0.682	11.08%	1.464	1.084	25.96%
5 users	0.73	0.62	15.07%	1.376	0.976	29.07%	0.676	0.582	13.91%	1.272	0.915	28.07%
8 users	0.758	0.615	18.87%	1.392	0.971	30.24%	0.674	0.581	13.80%	1.228	0.904	26.38%
20 users	0.899	0.661	26.47%	1.506	1.052	30.15%	0.731	0.62	15.18%	1.303	0.967	25.79%

	75th percentile						95th percentile
	Check-in KW	Check-in LT	Delta	Check-out KW	Check-out LT	Delta	Check-in KW	Check-in LT	Delta	Check-out KW	Check-in LT	Delta
1 user	0.915	0.774	15.41%	1.716	1.24	27.74%	1.214	1.000	17.63%	2.486	1.812	27.11%
5 users	0.769	0.662	13.91%	1.448	1.02	29.56%	1.052	0.844	19.77%	1.448	1.440	0.55%
8 users	0.839	0.654	22.05%	1.44	1.006	30.14%	1.202	0.836	30.45%	2.343	1.428	39.05%
20 users	0.931	0.702	24.60%	1.535	1.107	27.88%	1.531	0.918	40.04%	2.504	1.593	36.38%

"Worst" API Comparisons

The APIs identified in the table below were the ones that took over 100ms to execute in Lotus and Kiwi in the 75th percentile. In Lotus these APIs are generally better especially with 5 or more concurrent users. Impressively the response times of POST checkout-by-barcode and POST check-in-by-barcode all are under 400ms! GET inventory/items' response times are well under 100ms now compared to being on the border in the Kiwi release.

Average Response Time in milliseconds.

API	1 user KW (75th %tile)	1 user LT (75th %tile)	5 users KW (75th %tile)	5 users LT (75th %tile)	8 users KW (75th %tile)	8 users LT (75th %tile)	20 users KW (75th %tile)	20 users LT (75th %tile)
POST checkout-by-barcode	550	435	436	330	428	349	461	384
POST checkin-by-barcode	444	386	368	298	438	315	504	348
GET circulation/loans	305	222	262	161	267	172	303	196
GET inventory/items	124	97	96	71	92	67	99	75

Longevity Test

Longevity test shows that Check Out response time increased as time went on, but not as aggressive as in Kiwi.

	Check In	Check Out
1st Hour	0.572s	0.995s
13th Hour	0.526s	1.248s
19th Hour	0.541s	1.368s

In the response time graph below the Checkout Controller time, which gathers all check-out API response times), increased over the 24-hours window, from just under 1s to 1.4s. The vertical bands seen are not due to response times spiking but because of the number of users dropped to 0 at time. It's unsure if this behavior is due to tooling. Toward the last hour the check out times spiked. This is due to the testing tool erratically spinning up another Perfmeter that runs the same test.

The DB CPU utilization percentage also increased over time, from 16% to 23% by the end of the test. There are huge spikes every 20 minutes. These are due to the background tasks that run periodically and as the number of loans grew so did the physical resources needed to process all of them.

The number of connections also rose over time, from 344 to 383 connections. It's unclear what (DB connections) caused the DB to use more CPU resources as the test progressed.

Database's memory dipped a bit but no symptoms of memory leaks. The memory level bounced back up right after the test finished.

Looking at the DB logs there is one query that stood out as being slow:

SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan WHERE ((((lower(f_unaccent(loan.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Open'))) AND (CASE WHEN length(lower('2022-04-04T04:09:56.031Z')) <= 600 THEN left(lower(loan.jsonb->>'dueDate'),600) < lower('2022-04-04T04:09:56.031Z') ELSE left(lower(loan.jsonb->>'dueDate'),600) < left(lower('2022-04-04T04:09:56.031Z'),600) AND lower(loan.jsonb->>'dueDate') <= lower('2022-04-04T04:09:56.031Z') END)) AND (CASE WHEN length(lower(f_unaccent('Claimed returned'))) <= 600 THEN left(lower(f_unaccent(loan.jsonb->>'itemStatus')),600) NOT LIKE lower(f_unaccent('Claimed returned')) ELSE left(lower(f_unaccent(loan.jsonb->>'itemStatus')),600) NOT LIKE left(lower(f_unaccent('Claimed returned')),600) OR lower(f_unaccent(loan.jsonb->>'itemStatus')) NOT LIKE lower(f_unaccent('Claimed returned')) END)) AND (CASE WHEN length(lower(f_unaccent('Aged to lost'))) <= 600 THEN left(lower(f_unaccent(loan.jsonb->>'itemStatus')),600) NOT LIKE lower(f_unaccent('Aged to lost')) ELSE left(lower(f_unaccent(loan.jsonb->>'itemStatus')),600) NOT LIKE left(lower(f_unaccent('Aged to lost')),600) OR lower(f_unaccent(loan.jsonb->>'itemStatus')) NOT LIKE lower(f_unaccent('Aged to lost')) END)) AND (CASE WHEN length(lower(f_unaccent('Declared lost'))) <= 600 THEN left(lower(f_unaccent(loan.jsonb->>'itemStatus')),600) NOT LIKE lower(f_unaccent('Declared lost')) ELSE left(lower(f_unaccent(loan.jsonb->>'itemStatus')),600) NOT LIKE left(lower(f_unaccent('Declared lost')),600) OR lower(f_unaccent(loan.jsonb->>'itemStatus')) NOT LIKE lower(f_unaccent('Declared lost')) END) ORDER BY left(lower(f_unaccent(loan.jsonb->>'dueDate')),600), lower(f_unaccent(loan.jsonb->>'dueDate')) LIMIT 500 OFFSET 14500

This query took a little more than 1000ms to run.

Modules CPU Utilization During Longevity Test

Here is a view of the CPU utilization. A couple of observations:

mod-users and mod-patron-blocks seem to take up CPU resources in an identical cyclical way.
Like Kiwi, Okapi uses about 450-470% CPU on average.
Other modules used less than 100% CPU on average.
mod-circulation spikes periodically due to processing background tasks
mod-search and mod-inventory-storage seem to have the same behavior, spiking toward the end of the test

Here is the Service CPU Utilization graph without Okapi for better clarity of other modules' behaviors.

There does not appear to be any memory leaks issues in Lotus. Only mod-inventory-storage seems to have the spikes but but they were not prolonged and the processes did not crash.

The timeline highlighted below encompasses all 4 test runs (1, 5, 8, 20 users).

Modules CPUs and Memory Utilization

The relevant services overall seem to occupy CPU resources nominally.

20-users tests	Avg	Max
mod-users	35%	35%
mod-pubsub	70%	71%
okapi	456%	456%
mod-circulation	48%	52%
mod-circulation-storage	36%	38%
mod-inventory	45%	50%
mod-inventory-storage	113%	156%
mod-patron-blocks	17%	19%
mod-feesfines	21%	21%

Lotus Okapi's CPU % for 20 users is about a little higher than in Kiwi. The other relevant modules actually consume less CPU.

Services' memory seem to be stable during the test runs, although these short test runs do not veal much. mod-inventory-storage seems to be hogging memory as it goes along, but during these short tests this behavior did not have any effect on the results.

	Avg	Max
mod-users	68%	68%
mod-pubsub	55%	55%
okapi	62%	64%
mod-circulation	62%	62%
mod-circulation-storage	72%	72%
mod-inventory	31%	31%
mod-inventory-storage	118%	119%
mod-patron-blocks	76%	76%
mod-feesfines	75%	75%

Database

The freeable memory metric shows how much memory is being consumed and remaining during a test run. In the graph below memory usage is pretty good and stable throughout the test runs but generally slightly trending down after each run and bounced back almost to the starting point after each test. This does not mean there is a memory leak because the 24 hours longevity test does not reveal any leaks (see above).

Users Tested	MB consumed
1	222
5	372
8	374
20	535

Database average CPU Utilization also improves and steady as well, compared to Kiwi GA.

Users Tested	Kiwi %	Lotus %
1	9	7
5	13	10
8	14 (with a 78% spike)	12
20	24	18%

Database connections averages are lower than those in Kiwi

Users Tested	Kiwi	Lotus
1	478	350
5	580	375
8	580	375
20	608	400

Miscellaneous

Raw test data: https://docs.google.com/spreadsheets/d/1u1kggGjbQAziaZKCe5chyTbKfuiFnUeDEjFwu7nzDjg/edit?usp=sharing

Browser not supported