Check-in-check-out Test Report (Nolana)

20-users testsAvgMax

mod-users21%21%

mod-pubsub5%5%

okapi14%14%

mod-circulation4%4%

mod-circulation-storage5%5%

mod-inventory7%7%

mod-inventory-storage7%7%

mod-patron-blocks1%1%

mod-feesfines17%17%

mod-authtoken31%20-users testsAvgMax


mod-users21%21%


mod-pubsub5%5%


okapi14%14%


mod-circulation4%4%


mod-circulation-storage5%5%


mod-inventory7%7%


mod-inventory-storage7%7%


mod-patron-blocks1%1%


mod-feesfines17%17%


mod-authtoken31%78%














Overview

This is a report for a series of Check-in-check-out test runs against the Nolana release. 

Infrastructure

PTF -environment ncp3

  • 9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 instances of db.r6.xlarge database instances, one reader, and one writer
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=truec
    • log.retention.minutes=480
    • default.replication.factor=3

Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

mod-inventory19.0.112102428802592512m1814m

mod-inventory-storage

25.0.1121024

2208

(1872 in MG)

1952

(1684 in MG)

512m1440m
okapi4.14.7131024

1684

(1512 in MG)

1440

(1360 in MG)

512m922m
mod-feesfines18.1.0121281024896128768
mod-patron-blocks1.7.11210241024896128768
mod-pubsub2.7.0121024

1536

(1440 in MG)

1440

(1296 in MG)

512922
mod-authtoken2.12.012

512

(128 in MG)

1440

(1024 in MG)

1152

(896 in MG)

128

922

(768 in MG)

mod-circulation-storage15.0.0121024

1536

(1152 in MG)

1440

(1024 in MG)

512896
mod-circulation23.3.01210241024896128768
mod-configuration5.9.0121281024896128m768m
mod-users19.0.0122581024896128m768m
mod-remote-storage1.7.01212818721684512m1178m

MG- Morning Glory release

Front End:

  • Item Check-in (folio_checkin-7.2.0)
  • Item Check-out (folio_checkout-8.2.0)

High-Level Summary

  • In general, there are insignificant regressions in performance in the Nolana release compared to Morning Glory.  Check-in times and check-out times regressed by 10%, but are still about 20% better than the Lotus release.  The 25 users' response times are very similar to the 5 users. This means that the system is very stable from 1 to 25 users. The 20 users test has the best response times and the smallest difference from the Morning Glory release.
  • Services' memory utilization slightly increases during the test runs. Probably it is a result of new cluster usage for the tests and memory utilization will grow over time until reaches some steady state. Added table for comparison of Nolana to Morning Glory memory utilization. To be sure of memory leak presence longevity test has to be performed.

  • The relevant services overall seem to occupy CPU resources nominally. Only mod-authtoken seems to have the spikes but the processes did not crash. CPU usage of all modules did not exceed 31%.
  • RDS CPU utilization did not exceed 20%.
  • Longevity test shows response times worsen over time.

Test Runs

Test

Virtual Users

Duration

Load generator size (recommended)Load generator Memory(GiB) (recommended)

1.

5 users30 minst3.medium3

2.

8 users30 minst3.medium3

3.

20 users30 minst3.medium4
4.25 users30 minst3.medium4

Results

Response Times (Average of all tests listed above, in seconds)


Average (seconds)50th %tile (seconds)75th %tile (seconds)95th %tile  (seconds)

Check-inCheck-outCheck-inCheck-outCheck-inCheck-outCheck-inCheck-out
5 users0.4650.7190.4470.6830.4800.7270.5620.902
8 users0.4560.6980.4410.6720.4710.7090.5340.839
20 users0.4250.6580.4140.6380.4300.6610.4890.758
25 users0.4360.6760.4300.6610.4470.6820.4960.773

Response times are consistently good, and there is no response time over 1 second. The 20 users test has the best response times.

Comparisons to the Last Release

The following tables compare Nolana's test results against Morning Glory.

Response Time Comparison

In the tables below, the Delta columns express the differences between Nolana and Morning Glory releases in percentage.

In general, there are insignificant regressions in performance, in response times across the board.  Check-in times and check-out times regressed by 10%, but are still about 20% better than the Lotus release.  The 25 users' response times are very similar to the 5 users. This means that the system is very stable from 1 to 25 users. The 20 users test has the best response times and the smallest difference from the Morning Glory release.

Note: Nolana = Nolana build, MG = Morning Glory build


Average50th percentile 
Check-in MGCheck-in NolanaDeltaCheck-out MGCheck-out NolanaDeltaCheck-in MGCheck-in NolanaDeltaCheck-out MGCheck-out NolanaDelta
5 users0.4180.46511.2%0.6460.71911.3%0.4080.4479.5%0.6240.6839.4%
8 users0.4110.45610.9%0.6270.69811.3%0.40.44110.2%0.6060.67210.8%
20 users0.4010.4255.98%0.6150.6586.99%0.3930.4145.3%0.60.638'6.3%
25 users-0.436--0.676--0.430
-0.661-


75th percentile95th percentile 
Check-in MGCheck-in NolanaDeltaCheck-out MGCheck-out NolanaDeltaCheck-in MGCheck-in NolanaDeltaCheck-out MGCheck-in NolanaDelta
5 users0.4330.48010.8%0.6720.7278.1%0.4930.56213.9%0.7660.90217.7%
8 users0.4240.47111%0.6460.7099.7%0.4780.53411.7%0.7410.83913.2%
20 users0.4130.4304.1%0.6320.6614.5%0.4560.4897.2%0.7040.7587.6%
25 users-0.447--0.682--0.496
-0.773-

"Worst" API Comparisons

The APIs identified in the table below were the ones that took over 100ms to execute in Lotus and Morning Glory in the 75th percentile.  In Nolana, these APIs are slightly worse, especially with 5 concurrent users. The response times of POST-checkout-by-barcode and POST-check-in-by-barcode all still are under 300ms! GET inventory/items' response times are well under 47ms now compared to 60ms in Morning Glory, and being on the border in the Lotus release so now it can be eliminated from the tables.

Average Response Time in milliseconds. 

API

5 users MG (75th %tile)

5 users Nolana (75th %tile) 8 users MG (75th %tile)8 users Nolana (75th %tile)20 users MG (75th %tile)20 users Nolana (75th %tile)25 users MG (75th %tile)25 users Nolana (75th %tile)
POST checkout-by-barcode217245211237204226-228
POST checkin-by-barcode 199236199228189214-217
GET circulation/loans113127109122107120-123

Memory Utilization

Services' memory utilization slightly increases during the test runs. Probably it is a result of new cluster usage for the tests and memory utilization will grow over time until reaches some steady state. Added table for comparison of Nolana to Morning Glory memory utilization. The min memory utilization represents the minimal level at the beginning of all tests and the max number in the table is the highest memory usage during all the baseline tests. For Nolana, memory utilization is 10-30% lower than for Morning Glory (For some modules memory limits were increased). To be sure of memory leak presence longevity test has to be performed.


Memory Utilization comparison table


Morning Glory AvgMorning Glory Max

Nolana Min

Nolana Max
mod-users36%36%25%26%
mod-pubsub45%45%28%31%
okapi50%50%45%45%
mod-circulation80%80%50%57%
mod-circulation-storage55%56%23%25%
mod-inventory91%91%69.0%69.8%
mod-inventory-storage71%74%31%32%
mod-patron-blocks44%44%33%34%
mod-feesfines39%39%28%28%
mod-authtoken30%30%15%15%

 Modules CPUs Utilization

The relevant services overall seem to occupy CPU resources nominally. Only mod-authtoken seems to have the spikes but the processes did not crash. CPU usage of all modules did not exceed 31%.


Database and network


Database connections averages are similar to those in Morning Glory

Users TestedMorning GloryNolana
5336337
8346342
20351355
25-360

Longevity Test

The longevity test shows that Check Out response time increased as time went on. 


Check-In

Check Out

1st Hour0.442s0.850s
12th Hour0.484s1.086s
24th Hour0.568s1.485s

In the response time graph below the Checkout Controller time, which gathers all check-out API response times), increased over the 24-hours window, from 0.850s to 1.485s. 


The DB CPU utilization percentage increased over time by 7% and was about 21% by the end of the test. There are huge spikes every 30 minutes. These are due to the background tasks that run periodically and as the number of loans grew.

The number of connections also rose over time, from 360 to 370 connections. It's unclear what (DB connections) caused the DB to use more CPU resources as the test progressed.


Database's memory dipped a bit but no symptoms of memory leaks. The memory level bounced back up right after the test finished. 

Modules CPU Utilization During Longevity Test

Here is a view of the CPU utilization. A couple of observations:

  • mod-authtoken seems to take up CPU resources in a cyclical way from 9% to 12%. 
  • Okapi uses only about 20%CPU on average compared to Lotus about 450-470% CPU on average.
  • mod-users CPU Utilization grows from 46% to 70% and rapidly decreased to 38% then grows again and decreased reaching 80% of CPU utilization.
  • mod-inventory-storage CPU Utilization grows from 10% to 36% and rapidly decreased to 15% then grows again and reached 25% of CPU utilization.
  • mod-configuration CPU Utilization grows from 24% to 38% and rapidly decreased to 23% and was the same till the end of the test.
  • mod-feesfine CPU Utilization grows from 15% to 35% and spikes periodically up to 50% every 30 minutes.
  • Other modules used less than 20% CPU on average.

Here is the Service CPU Utilization graph with the main involved mods.




There do not appear to be any memory leak issues in Nolana. There were no spikes and the processes did not crash. 

CICO Tests with R/W Split enabled

PERF-362 - Getting issue details... STATUS

Modules that had R/W split enabled

1mod-inventory
2

mod-inventory-storage

3mod-feesfines
4mod-patron-blocks
5mod-pubsub
6mod-authtoken
7mod-circulation-storage
8mod-circulation
9mod-configuration
10mod-users
11mod-remote-storage

Results

Response Times (Average of all tests listed above, in seconds)


Average (seconds)50th %tile (seconds)75th %tile (seconds)95th %tile  (seconds)

Check-inCheck-outCheck-inCheck-outCheck-inCheck-outCheck-inCheck-out
5 users0.4240.7720.4060.7370.4440.8000.5310.967
8 users0.3910.6940.3720.6660.4040.7110.5260.860
20 users0.3720.6670.3550.6390.3760.6700.4740.845
25 users0.3820.6860.3650.6610.3860.6950.5260.877

Response times are consistently good, and there is no response time over 1 second. The 20 users test has the best response times.

Comparisons to the Last Release

The following tables compare Nolana's test results against Nolana with R/W Split enabled.

Response Time Comparison

In the tables below, the Delta columns express the differences between Nolana and  Nolana with R/W Split enabled.

In general, there is an improvement in Check-In performance, in response times across the board by about 15%.  Check-out times regressed by 1-7% in general, mostly with the increasing time of OPTIONS API calls. The average time for 8 users is better for both Check-In and Check-Out. The 8 users test shows the best improvement. The 25 users' response times are better than the 5 users. This means that the system is very stable from 1 to 25 users. 

Note: Nolana = Nolana build, R/W = Nolana with R/W Split enabled


Average50th percentile 
Check-in R/WCheck-in NolanaDeltaCheck-out R/WCheck-out NolanaDeltaCheck-in R/WCheck-in NolanaDeltaCheck-out R/WCheck-out NolanaDelta
5 users0.4240.465-9,7%0.7720.7196,8%0.4060.447-10,1%0.7370.6837,3%
8 users0.3910.456-16,6%0.6940.698-0,57%0.3720.441-18,5%0.6660.672-0,9%
20 users0.3720.425-14,2%0.6670.6581,3%0.3550.414-16,6%0.6390.6380.2%
25 users0.3820.436-14,1%0.6860.6761,5%0.3650.430-17,8%0.6610.6610


75th percentile95th percentile 
Check-in R/WCheck-in NolanaDeltaCheck-out R/WCheck-out NolanaDeltaCheck-in R/WCheck-in NolanaDeltaCheck-out R/WCheck-in NolanaDelta
5 users0.4440.480-8,1%0.8000.7279,1%0.5310.562-5,8%0.9670.9026,7%
8 users0.4040.471-16,6%0.7110.7090,28%0.5260.534-1,5%0.8600.8392,4%
20 users0.3760.430-14,4%0.6700.6611,3%0.4740.489-3,2%0.8450.75810,3%
25 users0.3860.447-15,8%0.6950.6821,87%0.5260.4965,7%0.8770.77311,9%


Test 2 with R/W Split enabled resource usage comparison

The test represents 2 instances of the database (reader and writer) that are working simultaneously while R/W Split is enabled. CPU usage of the writer instance decreases so potentially it can handle a higher load.

Response times of CICO are almost the same (about 1,5% better in total) while R/W Split is enabled.

Service CPU usage is similar for both R/W Split is enabled and disabled. For mod-configuration CPU usage increases with time (when memory usage reaches its max level). The ticket for investigation created PERF-375 - Getting issue details... STATUS


Miscellaneous