Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

IN-REVIEW

Table of Contents
outlinetrue

Overview

This is a report for a series of Check-in-check-out test runs against the Honeysuckle release.

Jira Legacy
serverSystem JIRA
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-135

Backend:

  • mod-circulation-19.2.7
  • mod-circulation-storage-12.1.4
  • mod-inventory-16.1.3
  • mod-inventory-storage-19.4.4
  • mod-authtoken-2.6.0
  • mod-pubsub-1.3.3
  • okapi-4.3.3 (also with 4.2.2)

...

  • Check-out: Honeysuckle is slower by 9%-28% than Goldenrod
  • Check-in: 4%-22% slower than Goldenrod
  • APIs turned slower in Honeysuckle: GET /automated-patron-blocks/{id} (150% slower) and  GET /circulation/loans (60%). These are covered by MODPATBLK-70 and CIRC-1014, respectively
  • Okapi v4.3.3 seem to be using 2x-3x CPU cycles than in v1.3.2 (Goldenrod).  Potential issue found with the logging methods OKAPI-964 
  • mod-pubsub has a memory leak that would drag down performance under high loads (see section on longevity test): MODPUBSUB-136
  • Caching Okapi tokens in Okapi reduced mod-authtoken's CPU usage by over 90%
  • Database's memory usage improved dramatically from Goldenrod's - little memory consumptions observed.

...

The following table shows the slow 75th percentile APIs taking more than 100 ms to return, and also comparing them against Goldenrod's.  Other than the 1-user test, starting with the 5-users test, all listed APIs are slower with GET automated-patron-blocks leading the way at 150% slower, while GET circulation/loans regressed up to 60%

Note: GR = Goldenrod build, HS = Honeysuckle build

API

1 user  GR (75th %tile)

1 user  HS (75th %tile)

5 users GR (75th %tile)

5 users HS (75th %tile)

8 users GR (75th %tile)8 users HS (75th %tile)

20 users GR (75th %tile)

20 users HS (75th %tile)
GET circulation/loans0.3450.3490.3650.4060.0750.1220.6540.784
GET inventory/items0.2080.1860.2080.2220.2250.2440.3120.375

POST checkin-by-barcode 

0.6820.5930.6310.6640.8150.8741.2961.467
POST checkout-by-barcode0.7500.7170.6880.7840.7330.8771.2051.469
GET automated-patron-blocks0.0690.1630.0850.1800.0790.1970.1180.296

...

This table shows that the average response time of the checkout transaction is 9% to 28% slower than in Goldenrod, and check-in in Honeysuckle likewise is about 4-22% slower (data aggregated from 5 to 20 users test runs).

Longevity test

A 24-hour longevity test was performed with 20 concurrent users and running with Okapi 4.2.2. Key observations:

Image Removed

  • Performance degraded after 2 hours.
    • Average response time in the first two hours: 
      • Check-in:  2.28 sec
      • Check-out: 4.76 sec
    • Average response time in the first 12 hours:
      • Check-in: 2.709 sec
      • Check-out: 7.898 sec
    • Average response time in the last 12 hours:
      • Check-in:  4.605 sec
      • Check-out: 17.297 sec
    • Together the average response time for the entire test run:
      • Check-in: 3.414 sec
      • Check-out: 11.419 sec
  • Throughput degrades over time, unsurprisingly
    • Image Removed
  • Modules Services CPU utilizations (The black lines indicate the start and finish time of the test run)
  • Image Removed
    • With the load of 20 users, Okapi started out working hardest, but its CPU utilization drops down over time. This is because mod-pubsub's is leaking HTTP Client and taking up more resources therefore slowing down, causing Okapi to not need to route messages as much. Clearly mod-pubsub's CPU utilization went up. The other modules follow Okapi's pattern.
  • Services Modules Memory utilizations 
  • Image Removed
    • No modules exhibited memory leaks except for mod-pubsub
  • Although there were two instances of mod-pubsub running on two different ec2 instances, mod-pubsub's traffic seemed to have been stickied to one instance. Here are graphs showing mod-pubsub's on one instance using up memory and CPU resources, and on another instance not showing much activities:
    • Image Removed
    • mod-pubsub and Okapi on another node - Okapi's CPU utilization dwindles while mod-pubsub does not seem to be busy at all
    • Image Removed

CPUs and Memories

Okapi was profiled because of the apparent 3x CPU utilization compared to the Goldenrod runs.

Goldenrod test runs:

Image Removed

Honeysuckle test runs:

Image Removed

Clearly Okapi is using more CPU cycles in Honeysuckle than in Goldenrod, 1 user 

Other relevant modules' CPU utilizations in Goldenrod

Image Removed

Same modules in Honeysuckle:

Image Removed

A few things to note:

...


Update 1/14/2011

  • Subsequent investigations (
    Jira Legacy
    serverSystem JIRA
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyPERF-140
    AND
    Jira Legacy
    serverSystem JIRA
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyCIRC-1014
    ) on GET /circulation/loans do not show degradations by the API itself. We hypothesize that other API calls that were executed during the test run may have dragged down the response time, particularly if it was trying to read and write to the same rows in the database at the same time. 


Longevity test

A 24-hour longevity test was performed with 20 concurrent users and running with Okapi 4.2.2. Key observations:

Image Added

  • Performance degraded after 2 hours.
    • Average response time in the first two hours: 
      • Check-in:  2.28 sec
      • Check-out: 4.76 sec
    • Average response time in the first 12 hours:
      • Check-in: 2.709 sec
      • Check-out: 7.898 sec
    • Average response time in the last 12 hours:
      • Check-in:  4.605 sec
      • Check-out: 17.297 sec
    • Together the average response time for the entire test run:
      • Check-in: 3.414 sec
      • Check-out: 11.419 sec
  • Throughput degrades over time, unsurprisingly
    • Image Added


  • Modules Services CPU utilizations (The black lines indicate the start and finish time of the test run)
  • Image Added
    • With the load of 20 users, Okapi started out working hardest, but its CPU utilization drops down over time. This is because mod-pubsub's is leaking HTTP Client and taking up more resources therefore slowing down, causing Okapi to not need to route messages as much. Clearly mod-pubsub's CPU utilization went up. The other modules follow Okapi's pattern.
  • Services Modules Memory utilizations 
  • Image Added
    • No modules exhibited memory leaks except for mod-pubsub
  • Although there were two instances of mod-pubsub running on two different ec2 instances, mod-pubsub's traffic seemed to have been stickied to one instance. Here are graphs showing mod-pubsub's on one instance using up memory and CPU resources, and on another instance not showing much activities:
    • Image Added
    • mod-pubsub and Okapi on another node - Okapi's CPU utilization dwindles while mod-pubsub does not seem to be busy at all
    • Image Added

CPUs and Memories

Okapi was profiled because of the apparent 3x CPU utilization compared to the Goldenrod runs.

Goldenrod test runs:

Image Added

Honeysuckle test runs:

Image Added

Clearly Okapi is using more CPU cycles in Honeysuckle than in Goldenrod, 1 user 


Other relevant modules' CPU utilizations in Goldenrod

Image Added


Same modules in Honeysuckle:

Image Added

A few things to note:

  1. mod-authtoken uses much less CPU in Honeysuckle, over 90% reduction across all tests! This is because of the token caching functionality that was added to Okapi 4.x
  2. mod-circulation's CPU utilization in Honeysuckle averages over 20% lower than in Goldenrod.
  3. mod-circulation's CPU utilization in Honeysuckle is about 10-30% higher than in Goldenrod
  4. mod-inventory's CPU utilization in Honeysuckle averages 30% more than in Goldenrod
  5. mod-inventory-storage's CPU utilization in Honeysuckle averages 20% more than in Goldenrod 
  6. mod-pubsub's CPU utilization in Honeysuckle is about 15% less than in Goldenrod
  7. mod-patron-blocks CPU utilization in Honeysuckle is at least 30% less than in Goldenrod

...

Honeysuckle tests revealed the following missing indexes:

mod-circulation-storage missing indexes

Code Block
WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 920e1d64-c221-48a0-a44d-ff50f3ad6cd6 >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('920e1d64-c221-48a0-a44d-ff50f3ad6cd6'))
WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = ae4c1cf3-0738-4465-8112-e75089e5b5c6 >>> get_tsvector(f_unaccent(request.jsonb->>'requesterId')) @@ tsquery_phrase(f_unaccent('ae4c1cf3-0738-4465-8112-e75089e5b5c6'))
WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 7068e104-aa14-4f30-a8bf-71f71cc15e07 >>> get_tsvector(f_unaccent(request.jsonb->>'pickupServicePointId')) @@ tsquery_phrase(f_unaccent('7068e104-aa14-4f30-a8bf-71f71cc15e07'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == d7cabcb2-7431-43ea-a2cc-0dfe5bee17c6 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('d7cabcb2-7431-43ea-a2cc-0dfe5bee17c6'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'actionType', CQL >>> SQL: actionType == Check-out >>> lower(f_unaccent(patron_action_session.jsonb->>'actionType')) LIKE lower(f_unaccent('Check-out'))

...

Code Block
WARNING: Doing LIKE search without index for accounts.jsonb->>'userId', CQL >>> SQL: userId == e96618a9-04ee-4fea-aa60-306a8f4dd89b >>> lower(f_unaccent(accounts.jsonb->>'userId')) LIKE lower(f_unaccent('e96618a9-04ee-4fea-aa60-306a8f4dd89b'))
WARNING: Doing LIKE search without index for accounts.jsonb->'status'>>'name', CQL >>> SQL: status.name <> Closed >>> lower(f_unaccent(accounts.jsonb>'status'->>'name')) NOT LIKE lower(f_unaccent('Closed'))
WARNING: Doing LIKE search without index for manualblocks.jsonb->>'userId', CQL >>> SQL: userId == a79b533d-8f29-4be1-9415-5f5cd936623b >>> lower(f_unaccent(manualblocks.jsonb->>'userId')) LIKE lower(f_unaccent('a79b533d-8f29-4be1-9415-5f5cd936623b')) without index for accounts.jsonb->'status'>>'name', CQL >>> SQL: status.name <> Closed >>> lower(f_unaccent(accounts.jsonb>'status'->>'name')) NOT LIKE lower(f_unaccent('Closed'))
WARNING: Doing LIKE search without index for manualblocks.jsonb->>'userId', CQL >>> SQL: userId == a79b533d-8f29-4be1-9415-5f5cd936623b >>> lower(f_unaccent(manualblocks.jsonb->>'userId')) LIKE lower(f_unaccent('a79b533d-8f29-4be1-9415-5f5cd936623b'))

Results for okapi-4.5.2

Results for okapi-4.5.2 for 1,5,8,20 users for 30 minute run. From the response times below, the average Check-out for 20 users is slower. On average 60% slower than okapi-4.3.3.

'+' means performance improvement from okapi-4.3.3

'-' means performance degradation from okapi-4.3.3

For 20 users - 4 requests failed out of 113642

Response Times


Average (seconds)50th %tile (seconds)75th %tile (seconds)95th %tile  (seconds)

Check-inCheck-outCheck-inCheck-outCheck-inCheck-outCheck-inCheck-out
1 user0.9712.0720.921.9061.0132.0931.3262.905
5 users1.092 +2.584 -0.978 +2.323 +1.16 +2.746 +1.622 +4.021 -
8 users1.429 -3.057 -1.285 -2.747 -1.62 -3.354 -2.415 -5.079 -
20 users3.073 +7.877 -2.595 +6.307 +3.411 +8.287 +6.409 +14.703 +


CPUs and Memories

Service CPU Utilization:

CPU Utilization gradually increases as the number of users increase to 20 but this behavior is similar to okapi-4.3.3

Image Added


Service Memory Utilization:

Memory Utilization is a little high for mod-circulation 105% but for all other modules, it is relatively stable across all test runs for all users.

Image Added


RDS CPU Utilization

RDS CPU Utilization is around 50% more compared to okapi-4.3.3

Image Added

Comparison okapi-4.5.2 vs okapi-4.6.1

okapi-4.6.1 is slower than okapi-4.5.2. Checkin is 3.66% slower and Checkout is 9.48% slower. See below comparison for 8 Users 30-minute test run.

Image Added


Results for okapi-4.6.1

From the response times below, okapi-4.6.1, checkin-checkout for 1 user is a little slower but for 5, 8, 20 users, checkin-checkout is much faster comparing to okapi-4.3.3.

Response Times Okapi-4.3.3


Average (seconds)50th %tile (seconds)75th %tile (seconds)95th %tile  (seconds)

Check-inCheck-outCheck-inCheck-outCheck-inCheck-outCheck-inCheck-out
1 user0.942.1580.8852.0170.9692.1771.1982.906
5 users1.1262.5741.0252.3391.2112.791.774.007
8 users1.313 2.9481.1772.611.4873.2742.1955.045
20 users3.2527.4922.6816.3553.6058.3137.06115.747

Response Times Okapi-4.6.1

'+' means a performance improvement from okapi-4.3.3

'-' means a performance degradation from okapi-4.3.3


Average (seconds)50th %tile (seconds)75th %tile (seconds)95th %tile  (seconds)

Check-inCheck-in performance with okapi-4.3.3Check-outCheck-out performance with okapi-4.3.3Check-inCheck-outCheck-inCheck-in performance with okapi-4.3.3Check-outCheck-out performance with okapi-4.3.3Check-inCheck-out
1 user1.041 -9.7%2.332 -7%0.9572.1391.06-8.5%2.369-8.10%1.3783.394
5 users1.057 +6.5%2.374+8.4%0.9782.1761.133+6.88%2.532+10.18%1.5243.624
8 users1.277 +2.8%2.814 +4.7%1.1442.5121.44+3.2%3.074+6.50%2.1124.718
20 users2.374 +36.9%5.927 +26.4%2.1375.2462.716+32.76.552+26.874.1889.426


CPUs and Memories

Service CPU Utilization:

Compared to okapi-4.3.3, CPU Utilization for okapi-4.6.1 around the same.

Image Added


Service Memory Utilization:

Compared to okapi-4.3.3, Service Memory Utilization for okapi-4.6.1 around the same.

Image Added


RDS CPU Utilization

RDS CPU Utilization is normal for 1, 5, and 8 Users. For 20 Users, the CPU is higher almost 95% but considering the large load, it is normal as well. 

Image Added

8 Hours Longevity test run for 20 Users

Service CPU Utilization:

Okapi-4.6.1 consumes less CPU gradually from 1st hour into 8th hour. However, at the same time, mod-pubsub consumes more CPU by gradually increasing from 50% to almost 160% 

Image Added


Service Memory Utilization:

Memory usage for okapi-4.6.1  and other modules is relatively constant and stable throughout the run. mod-circulation memory consumption increases to 125% and then stabilizes.

Image Added


Comparison okapi-4.3.3 vs okapi-4.6.1 (okapi metrics enabled)

Below results are for 8 Users, 30 minutes against Checkin-Checkout workflow.

Grafana Performance Dashboard

Okapi-4.6.1 is around 71% faster than Okapi-4.3.3. Okapi-4.6.1 can process more requests and still perform better. In 30 minutes test run, okapi-4.6.1 was able to process 25% more requests with an average request per second(RPS) is 40.

Okapi-4.3.3 Grafana performance dashboard:

Image Added


Okapi-4.6.1 Grafana performance dashboard:

Image Added


Checkin-Checkout API level comparison

For Okapi-4.6.1, Check-in is 71% faster and Checkout is around 65% faster.

Image Added


Log request/response comparison

For Okapi-4.6.1, Log request has improved from 3.16 second to 0.243 seconds. Log request is faster by 1200% faster. Log response has improved from 3.25 seconds to 0.266 seconds which is 1100% faster.

Okapi-4.6.1 can process more log request/response and at the same time be fast. Okapi-4.6.1 processed [211k(logRequest), 220k(logResponse)] vs Okapi-4.3.3 which processed [170k(logRequest), 177k(logResponse)]. 

Okapi-4.3.3 log request/response comparison:

Image Added


Okapi-4.6.1 log request/response comparison:

Image Added


Service CPU Utilization

Okapi-4.6.1 consumes less CPU and hence more efficient. Average CPU Utilization for okapi-4.6.1 is around 380% vs okapi-4.3.3 which is 600%. 

Okapi-4.3.3 Service CPU Utilization:

Image Added


Okapi-4.6.1 Service CPU Utilization:

Image Added

Appendix

CIRC-1014

https://issuesfolio-org.folioatlassian.orgnet/browse/MODPATBLK-70

https://issuesfolio-org.folioatlassian.orgnet/browse/OKAPI-964

https://issuesfolio-org.folioatlassian.orgnet/browse/OKAPI-965

checkout-checkin-4.5.2-test-runs