Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Jira Legacy
serverSystem JiraJIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-39

...

In this workflow, we are checking the performance of the renewal of loan items for the Goldenrod release.

We tested it with 1, 2 virtual users for 30 minutes.  

Backend:

  • mod-users-17.1.0
  • mod-users-bl-6.0.0
  • mod-authtoken-2.5.1
  • mod-permissions-5.11.2
  • mod-circulation-19.0.9
  • mod-circulation-storage-12.0.1
  • okapi-3.1.1

Environment:

  • 55 back-end modules deployed in 110 ECS services
  • 3 okapi ECS services
  • 8 m5.large  EC2 instances
  • 2 (1 reader, 1 writer) db.r5.xlarge AWS RDS instance
  • INFO logging level

High Level Summary

  1. On average POST_/circulation/renew-by-barcode API call takes average 671.12 seconds as we gradually increase the number of loans and users to 200 and 2 respectively.
  2. POST_/circulation/renew-by-barcode API barcode API makes 26 different API requests to gather userBarcode and itemBarcode information. 34% of those are calls are to mod-authtoken.
    see  See Recommended Improvements for the JIRAs that were created/ covered by this testing effort.
  3. No missing indexes or no suspicious slow queries found except a number of SELECT count_estimate() calls.

Test Runs

GET rtac/${instanceId}

Test

Virtual Users

Duration

OKAPI log level

Profiled

Holdings Loans per instanceUser

API call

1. 

1

30 min

WARNING

No

greater than 300

10, 50, 100, 200

2. 

52

30 min

WARNING

No

greater than 300

GET rtac/${instanceId}

10, 50, 100, 200

3.130 minWARNINGOffgreater than 300

Mimic above call:

GET inventory/instances/${instanceId} → GET /holdings-storage/holdings → GET /inventory/items → GET /circulation/loans

4.530 minWARNINGOffgreater than 300

Mimic above call:

GET inventory/instances/${instanceId} → GET /holdings-storage/holdings → GET /inventory/items → GET /circulation/loans

Results

...

GET rtac/${instanceId}

...

Because there are no custom metrics inside mod-rtac to expose the wait time of each API call, we needed to mimic the rtac API logic by making subsequent calls to mod-inventory, mod-circulation to get open loans for greater than 300 holdings per instance in the JMeter script.  One caveat is that these calls are synchronous and are end-to-end (over the internet from JMeter to FOLIO) so naturally these API calls will be slower compared to the actual /rtac API calls from mod-rtac which were implemented asynchronously.  Therefore, the response times while are not likely what is seen by the code but should still give us a good idea of the cost of making n^3 API calls even if we take off some percentage for end-to-end over-the-wire latencies.

Calculations:

Total response time to get all holdings, items, open loans per instance(seconds) = time to get current instance + time to get all holdings 

time to get all holdings = get all holdings per instance + time to get all items + time to get all loans

time to get all items = get current_item * total items

time to get all loans = get current_loan * total loans

Get  an instance 

GET inventory/instances/${instanceId}

...

For each instance, get all holdings

GET /holdings-storage/holdings?limit=2147483647&query=instanceId==${instanceId}

...

For each holding, get all items

GET /inventory/items?limit=2147483647&query=holdingsRecordId==${current_holding}

...

For each item, get all open loans

GET /circulation/loans?limit=2147483647&query=(itemId==${current_item} and status.name==Open)

...

JVM Profiling

  • Overall slow methods (between the modules profiled: mod-rtac, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)

Image Removed

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1592598484852&to=1592600441623&var-service_tag=mod-rtac&var-aggregation=10s

Database

The database CPU usage decreases slightly as the number of users increases 1, 5  users run. At maximum 28% CPU usage for 1 user. We are seeing a slight decrease in CPU utilization for 5 Users because RDS Postgresql is caching the most frequent query results.

Image Removed

...

The slowest query which took the most of execution time was initiated by the mod_inventory_storage service presented in the following table:

...

Percent of total time

...

Average Time, ms

...

Calls

...

Query

...

SELECT COUNT(*) FROM fs09000000_mod_inventory_storage.holdings_record

API response time vs Postgresql query response time

This difference below between API and Postgresql query response time is the sum of network time to/back from calling database, deserialization of JSON string into JSON objects, processing time, and network time to/from API. Below are a single API request and SQL query triggered manually for comparison.

...

CPU Utilization

INFO200


Results

  1. Renew all loans by barcode

    Total Users for 30 minutesLoansRenew all loans (Average)POST_/circulation/renew-by-barcode (Average)
    1 user10

    6.52 seconds

    585.11 ms

    50

    31.81 seconds

    637.48 ms

    1001.06 minutes

    650.15 ms

    2002.09 minutes

    673.40 ms

    2 users10

    7.42 seconds

    683.76 ms

    50

    32.61 seconds

    686.65 ms

    100

    1.10 minutes

    705.37 ms

    200

    2.27 minutes

    747.04 ms



  2. Below is the dependency graph generated from Giraffe for single POST_/circulation/renew-by-barcode API. It shows individual calls made by renew-by-barcode to gather prerequisite data from different APIs such as item-storage, location-units, loan-types, request-storage, notice-policy, locations, and mod-authtoken. In all, this call took 990 ms, and 6% of that is consumed by mod-authtoken and the remaining 94% of the time is taken by other APIs to gather prerequisite data.

Image Added



Database CPU Utilization

The database CPU usage increases gradually for 1, 2  users run. At maximum 9% CPU usage for 1 user.

Image Added


For 2 users, max CPU utilization increases and then stabilizes as the number of loans increases. Therefore, CPU utilization is relatively on the lower side.

For 30 minutesLoansRDS CPU Utilization
1 user10

7%

50

8%

1008%
2009%
2 users10

14%

50

15%

100

15%

200

15%


Database queries 
Anchor
slowQueries
slowQueries

 For 1 run of 30 minutes test, following queries were made. To point out, total calls made and the total time spend on these queries is significant.

QueryTotal timeAverage TimeTotal calls made
SELECT count_estimate('SELECT jsonb,id FROM fs09000000_mod_inventory_storage.item WHERE lower(f_unaccent(item.jsonb->>''barcode'')) LIKE lower(f_unaccent(''54357881''))')71%14 ms9027
SELECT count_estimate('SELECT jsonb FROM fs09000000_mod_patron_blocks.user_summary WHERE (jsonb->>''userId'') = ''4deba809-4fb4-432a-b3df-c54b6ad51d22''')10%2 ms8457


Service CPU Utilization

Service CPU utilization is relatively on a lower side for most modules except for mod-inventory-storagecirculation. For service with the highest CPU utilization is shown by range to get a better picture.

For 30 minutesModulesService CPU Utilization
1 user
5 usersAverage %Average %
mod-inventory

3%

mod-inventory-storage

10%

mod-circulation
3.143.95
70%-95% 
mod-circulation-storage
4.846.38
6%
2 usersmod-inventory
5.386.54

3.25%

mod-inventory-storage
2624

17.75%

mod-circulation

70%-122.75%

mod-circulation-
rtac
storage
2

11.

45

75%

3

Service Memory Utilization

Memory Service memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent. 

...

for most modules except mod-circulation which was on the higher side. For service with the highest Memory utilization is shown by range to get a better picture.

For 30 minutesModulesService Memory Utilization
1 usermod-inventory

63%

mod-inventory-storage

56%

mod-circulation113%-118%
mod-circulation-storage
39.4639.55
42%
2 usersmod-inventory
44

65%

46.5
mod-inventory-storage
44.3045mod-rtac55.255

56%

mod-circulation

113%-118%

mod-circulation-storage

49%

Graph for service memory utilization for mod-circulation. Memory usage started at ~113%, rose to ~118% and fall back to ~101%

Image Added


Recommended Improvements 
Anchor
recommendedImprovements
recommendedImprovements

  • The following JIRA JIRAs has been created for mod-rtacauthtoken optimization:

Jira Legacy
serverSystem
Jira
JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
key
MODRTAC-38

  • Consider taking advantage of CQL "OR" operator to chain IDs together at each level of getting items, holdings, loans,  instead of making API calls.  This approach could be faster, although need to be proven.
  • Consider doing SQL joins between instance/holdings/items where possible to reduce making API and database calls.
  • In mod-circulation, mod-circulation-storage, mod-inventory-storage, consider using a more efficient JSON package or call using the existing Jackson serialization calls in a different way to address the item: JVM profiling shows JSON de/serialization operations one of the slowest operations.

Appendix

...

MODAT-80

Jira Legacy
serverSystem JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODAT-83

Appendix

See Attached user_loanRenewals-test-report.xlsx for details