Overview

This is a report for a series of Check-in-check-out test runs against the Nolana release.

Infrastructure

PTF -environment ncp3

9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 instances of db.r6.xlarge database instances, one reader, and one writer
MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=truec
- log.retention.minutes=480
- default.replication.factor=3

Modules memory and CPU parameters

Modules	Version	Task Definition	Running Tasks	CPU	Memory	MemoryReservation	MaxMetaspaceSize	Xmx
mod-inventory	19.0.1	1	2	1024	2880	2592	512m	1814m
mod-inventory-storage	25.0.1	1	2	1024	2208 (1872 in MG)	1952 (1684 in MG)	512m	1440m
okapi	4.14.7	1	3	1024	1684 (1512 in MG)	1440 (1360 in MG)	512m	922m
mod-feesfines	18.1.0	1	2	128	1024	896	128	768
mod-patron-blocks	1.7.1	1	2	1024	1024	896	128	768
mod-pubsub	2.7.0	1	2	1024	1536 (1440 in MG)	1440 (1296 in MG)	512	922
mod-authtoken	2.12.0	1	2	512 (128 in MG)	1440 (1024 in MG)	1152 (896 in MG)	128	922 (768 in MG)
mod-circulation-storage	15.0.0	1	2	1024	1536 (1152 in MG)	1440 (1024 in MG)	512	896
mod-circulation	23.3.0	1	2	1024	1024	896	128	768
mod-configuration	5.9.0	1	2	128	1024	896	128m	768m
mod-users	19.0.0	1	2	258	1024	896	128m	768m
mod-remote-storage	1.7.0	1	2	128	1872	1684	512m	1178m

MG- Morning Glory release

Front End:

Item Check-in (folio_checkin-7.2.0)
Item Check-out (folio_checkout-8.2.0)

High-Level Summary

In general, there are insignificant regressions in performance in the Nolana release compared to Morning Glory. Check-in times and check-out times regressed by 10%, but are still about 20% better than the Lotus release. The 25 users' response times are very similar to the 5 users. This means that the system is very stable from 1 to 25 users. The 20 users test has the best response times and the smallest difference from the Morning Glory release.
Services' memory utilization slightly increases during the test runs. Probably it is a result of new cluster usage for the tests and memory utilization will grow over time until reaches some steady state. Added table for comparison of Nolana to Morning Glory memory utilization. To be sure of memory leak presence longevity test has to be performed.
The relevant services overall seem to occupy CPU resources nominally. Only mod-authtoken seems to have the spikes but the processes did not crash. CPU usage of all modules did not exceed 31%.
RDS CPU utilization did not exceed 20%.
Longevity test shows response times worsen over time.

The test with R/W Split enabled - 2 instances of the database (reader and writer) are working simultaneously while R/W Split is enabled. CPU usage of the writer instance decreases so potentially it can handle a higher load.
Response times of CICO are almost the same (about 1,5% better in total) while R/W Split is enabled.
Service CPU usage is similar for both R/W Split is enabled and disabled. For mod-configuration CPU usage increases with time (when memory usage reaches its max level) for both R/W Split is enabled and disabled tests. The ticket for investigation created PERF-375 - Getting issue details... STATUS
Also created tickets for WARN CQL2PgJSON Doing FT search without indexes:
mod-users https://folio-org.atlassian.net/browse/PERF-376
mod-feesfines https://folio-org.atlassian.net/browse/PERF-377
mod-circulation-storage https://folio-org.atlassian.net/browse/PERF-378
mod-configuration https://folio-org.atlassian.net/browse/PERF-380

Test Runs

Test	Virtual Users	Duration	Load generator size (recommended)	Load generator Memory(GiB) (recommended)
1.	5 users	30 mins	t3.medium	3
2.	8 users	30 mins	t3.medium	3
3.	20 users	30 mins	t3.medium	4
4.	25 users	30 mins	t3.medium	4

Results

Response Times (Average of all tests listed above, in seconds)

	Average (seconds)		50th %tile (seconds)		75th %tile (seconds)		95th %tile (seconds)
	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out
5 users	0.465	0.719	0.447	0.683	0.480	0.727	0.562	0.902
8 users	0.456	0.698	0.441	0.672	0.471	0.709	0.534	0.839
20 users	0.425	0.658	0.414	0.638	0.430	0.661	0.489	0.758
25 users	0.436	0.676	0.430	0.661	0.447	0.682	0.496	0.773

Response times are consistently good, and there is no response time over 1 second. The 20 users test has the best response times.

Comparisons to the Last Release

The following tables compare Nolana's test results against Morning Glory.

Response Time Comparison

In the tables below, the Delta columns express the differences between Nolana and Morning Glory releases in percentage.

In general, there are insignificant regressions in performance, in response times across the board. Check-in times and check-out times regressed by 10%, but are still about 20% better than the Lotus release. The 25 users' response times are very similar to the 5 users. This means that the system is very stable from 1 to 25 users. The 20 users test has the best response times and the smallest difference from the Morning Glory release.

Note: Nolana = Nolana build, MG = Morning Glory build

	Average						50th percentile
	Check-in MG	Check-in Nolana	Delta	Check-out MG	Check-out Nolana	Delta	Check-in MG	Check-in Nolana	Delta	Check-out MG	Check-out Nolana	Delta
5 users	0.418	0.465	11.2%	0.646	0.719	11.3%	0.408	0.447	9.5%	0.624	0.683	9.4%
8 users	0.411	0.456	10.9%	0.627	0.698	11.3%	0.4	0.441	10.2%	0.606	0.672	10.8%
20 users	0.401	0.425	5.98%	0.615	0.658	6.99%	0.393	0.414	5.3%	0.6	0.638	'6.3%
25 users	-	0.436	-	-	0.676	-	-	0.430		-	0.661	-

	75th percentile						95th percentile
	Check-in MG	Check-in Nolana	Delta	Check-out MG	Check-out Nolana	Delta	Check-in MG	Check-in Nolana	Delta	Check-out MG	Check-in Nolana	Delta
5 users	0.433	0.480	10.8%	0.672	0.727	8.1%	0.493	0.562	13.9%	0.766	0.902	17.7%
8 users	0.424	0.471	11%	0.646	0.709	9.7%	0.478	0.534	11.7%	0.741	0.839	13.2%
20 users	0.413	0.430	4.1%	0.632	0.661	4.5%	0.456	0.489	7.2%	0.704	0.758	7.6%
25 users	-	0.447	-	-	0.682	-	-	0.496		-	0.773	-

"Worst" API Comparisons

The APIs identified in the table below were the ones that took over 100ms to execute in Lotus and Morning Glory in the 75th percentile. In Nolana, these APIs are slightly worse, especially with 5 concurrent users. The response times of POST-checkout-by-barcode and POST-check-in-by-barcode all still are under 300ms! GET inventory/items' response times are well under 47ms now compared to 60ms in Morning Glory, and being on the border in the Lotus release so now it can be eliminated from the tables.

Average Response Time in milliseconds.

API	5 users MG (75th %tile)	5 users Nolana (75th %tile)	8 users MG (75th %tile)	8 users Nolana (75th %tile)	20 users MG (75th %tile)	20 users Nolana (75th %tile)	25 users MG (75th %tile)	25 users Nolana (75th %tile)
POST checkout-by-barcode	217	245	211	237	204	226	-	228
POST checkin-by-barcode	199	236	199	228	189	214	-	217
GET circulation/loans	113	127	109	122	107	120	-	123

Memory Utilization

Services' memory utilization slightly increases during the test runs. Probably it is a result of new cluster usage for the tests and memory utilization will grow over time until reaches some steady state. Added table for comparison of Nolana to Morning Glory memory utilization. The min memory utilization represents the minimal level at the beginning of all tests and the max number in the table is the highest memory usage during all the baseline tests. For Nolana, memory utilization is 10-30% lower than for Morning Glory (For some modules memory limits were increased). To be sure of memory leak presence longevity test has to be performed.

Memory Utilization comparison table

	Morning Glory Avg	Morning Glory Max	Nolana Min	Nolana Max
mod-users	36%	36%	25%	26%
mod-pubsub	45%	45%	28%	31%
okapi	50%	50%	45%	45%
mod-circulation	80%	80%	50%	57%
mod-circulation-storage	55%	56%	23%	25%
mod-inventory	91%	91%	69.0%	69.8%
mod-inventory-storage	71%	74%	31%	32%
mod-patron-blocks	44%	44%	33%	34%
mod-feesfines	39%	39%	28%	28%
mod-authtoken	30%	30%	15%	15%

Modules CPUs Utilization

The relevant services overall seem to occupy CPU resources nominally. Only mod-authtoken seems to have the spikes but the processes did not crash. CPU usage of all modules did not exceed 31%.

Database and network

Database connections averages are similar to those in Morning Glory

Users Tested	Morning Glory	Nolana
5	336	337
8	346	342
20	351	355
25	-	360

Longevity Test

The longevity test shows that Check Out response time increased as time went on.

	Check-In	Check Out
1st Hour	0.442s	0.850s
12th Hour	0.484s	1.086s
24th Hour	0.568s	1.485s

In the response time graph below the Checkout Controller time, which gathers all check-out API response times), increased over the 24-hours window, from 0.850s to 1.485s.

The DB CPU utilization percentage increased over time by 7% and was about 21% by the end of the test. There are huge spikes every 30 minutes. These are due to the background tasks that run periodically and as the number of loans grew.

The number of connections also rose over time, from 360 to 370 connections. It's unclear what (DB connections) caused the DB to use more CPU resources as the test progressed.

Database's memory dipped a bit but no symptoms of memory leaks. The memory level bounced back up right after the test finished.

Modules CPU Utilization During Longevity Test

Here is a view of the CPU utilization. A couple of observations:

mod-authtoken seems to take up CPU resources in a cyclical way from 9% to 12%.
Okapi uses only about 20%CPU on average compared to Lotus about 450-470% CPU on average.
mod-users CPU Utilization grows from 46% to 70% and rapidly decreased to 38% then grows again and decreased reaching 80% of CPU utilization.
mod-inventory-storage CPU Utilization grows from 10% to 36% and rapidly decreased to 15% then grows again and reached 25% of CPU utilization.
mod-configuration CPU Utilization grows from 24% to 38% and rapidly decreased to 23% and was the same till the end of the test.
mod-feesfine CPU Utilization grows from 15% to 35% and spikes periodically up to 50% every 30 minutes.
Other modules used less than 20% CPU on average.

Here is the Service CPU Utilization graph with the main involved mods.

There do not appear to be any memory leak issues in Nolana. There were no spikes and the processes did not crash.

CICO Tests with R/W Split enabled

PERF-362 - Getting issue details... STATUS

Modules that had R/W split enabled

1	mod-inventory
2	mod-inventory-storage
3	mod-feesfines
4	mod-patron-blocks
5	mod-pubsub
6	mod-authtoken
7	mod-circulation-storage
8	mod-circulation
9	mod-configuration
10	mod-users
11	mod-remote-storage

Results

Response Times (Average of all tests listed above, in seconds)

	Average (seconds)		50th %tile (seconds)		75th %tile (seconds)		95th %tile (seconds)
	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out
5 users	0.424	0.772	0.406	0.737	0.444	0.800	0.531	0.967
8 users	0.391	0.694	0.372	0.666	0.404	0.711	0.526	0.860
20 users	0.372	0.667	0.355	0.639	0.376	0.670	0.474	0.845
25 users	0.382	0.686	0.365	0.661	0.386	0.695	0.526	0.877

Response times are consistently good, and there is no response time over 1 second. The 20 users test has the best response times.

Comparisons to the Last Release

The following tables compare Nolana's test results against Nolana with R/W Split enabled.

Response Time Comparison

In the tables below, the Delta columns express the differences between Nolana and Nolana with R/W Split enabled.

In general, there is an improvement in Check-In performance, in response times across the board by about 15%. Check-out times regressed by 1-7% in general, mostly with the increasing time of OPTIONS API calls. The average time for 8 users is better for both Check-In and Check-Out. The 8 users test shows the best improvement. The 25 users' response times are better than the 5 users. This means that the system is very stable from 1 to 25 users.

Note: Nolana = Nolana build, R/W = Nolana with R/W Split enabled

	Average						50th percentile
	Check-in R/W	Check-in Nolana	Delta	Check-out R/W	Check-out Nolana	Delta	Check-in R/W	Check-in Nolana	Delta	Check-out R/W	Check-out Nolana	Delta
5 users	0.424	0.465	-9,7%	0.772	0.719	6,8%	0.406	0.447	-10,1%	0.737	0.683	7,3%
8 users	0.391	0.456	-16,6%	0.694	0.698	-0,57%	0.372	0.441	-18,5%	0.666	0.672	-0,9%
20 users	0.372	0.425	-14,2%	0.667	0.658	1,3%	0.355	0.414	-16,6%	0.639	0.638	0.2%
25 users	0.382	0.436	-14,1%	0.686	0.676	1,5%	0.365	0.430	-17,8%	0.661	0.661	0

	75th percentile						95th percentile
	Check-in R/W	Check-in Nolana	Delta	Check-out R/W	Check-out Nolana	Delta	Check-in R/W	Check-in Nolana	Delta	Check-out R/W	Check-in Nolana	Delta
5 users	0.444	0.480	-8,1%	0.800	0.727	9,1%	0.531	0.562	-5,8%	0.967	0.902	6,7%
8 users	0.404	0.471	-16,6%	0.711	0.709	0,28%	0.526	0.534	-1,5%	0.860	0.839	2,4%
20 users	0.376	0.430	-14,4%	0.670	0.661	1,3%	0.474	0.489	-3,2%	0.845	0.758	10,3%
25 users	0.386	0.447	-15,8%	0.695	0.682	1,87%	0.526	0.496	5,7%	0.877	0.773	11,9%

Test 2 with R/W Split enabled resource usage comparison

The test represents 2 instances of the database (reader and writer) that are working simultaneously while R/W Split is enabled. CPU usage of the writer instance decreases so potentially it can handle a higher load.

Response times of CICO are almost the same (about 1,5% better in total) while R/W Split is enabled.

Service CPU usage is similar for both R/W Split is enabled and disabled. For mod-configuration CPU usage increases with time (when memory usage reaches its max level). The ticket for investigation created PERF-375 - Getting issue details... STATUS

Folio Development Teams

Check-in-check-out Test Report (Nolana)

Overview

Infrastructure

High-Level Summary

Test Runs

Results

Comparisons to the Last Release

Response Time Comparison

"Worst" API Comparisons

Memory Utilization

Memory Utilization comparison table

Modules CPUs Utilization

Longevity Test

Modules CPU Utilization During Longevity Test

CICO Tests with R/W Split enabled

PERF-362 - Getting issue details... STATUS

Modules that had R/W split enabled

Results

Comparisons to the Last Release

Response Time Comparison

Test 2 with R/W Split enabled resource usage comparison

Miscellaneous