...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Jira Legacy | ||||||||
---|---|---|---|---|---|---|---|---|
|
...
In this workflow, we are checking the performance of the renewal of loan items for the Goldenrod release.
We tested it with 1, 2 virtual users for 30 minutes.
Backend:
- mod-users-17.1.0
- mod-users-bl-6.0.0
- mod-authtoken-2.5.1
- mod-permissions-5.11.2
- mod-circulation-19.0.9
- mod-circulation-storage-12.0.1
- okapi-3.1.1
Environment:
- 55 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 1 2 (1 reader, 1 writer) db.r5.xlarge AWS RDS instance
- INFO logging level
...
- On average POST_/circulation/renew-by-barcode API call takes average 671.12 seconds as we gradually increase the number of loans and users to 200 and 2 respectively.
- POST_/circulation/renew-by-barcode API makes 26 different API requests to gather userBarcode and itemBarcode information. 34% of those are calls are to mod-authtoken. S See Recommended Improvements for the JIRAs that were covered by this testing effort.
- No missing indexes or no suspicious slow queries found except a number of SELECT count_estimate() calls.
Test Runs
Test | Virtual Users | Duration | OKAPI log level | Profiled | Loans per User |
1. | 1 | 30 min | WARNINGNo | 10, 50, 100, 200 | |
2. | 2 | 30 min | WARNING | No | 10, 50, 100, 200 |
3. | 1 | 30 min | INFO | No | 10, 50, 100, 200 |
Results
Renew all loans by barcode
For
AverageTotal Users for 30 minutes Loans Average Renew all loans (Average) POST_/circulation/renew-by-barcode (Average) 1 user 10 6.52 seconds
585.11 ms
50 31.81 seconds
637.48 ms
100 1.06 minutes 650.15 ms
200 2.09 minutes 673.40 ms
2 users 10 7.42 seconds
683.76 ms
50 32.61 seconds
686.65 ms
100 1.10 minutes
705.37 ms
200 2.27 minutes
747.04 ms
Because there are no custom metrics inside mod-rtac to expose the wait time of each API call, we needed to mimic the rtac API logic by making subsequent calls to mod-inventory, mod-circulation to get open loans for greater than 300 holdings per instance in the JMeter script. One caveat is that these calls are synchronous and are end-to-end (over the internet from JMeter to FOLIO) so naturally these API calls will be slower compared to the actual /rtac API calls from mod-rtac which were implemented asynchronously. Therefore, the response times while are not likely what is seen by the code but should still give us a good idea of the cost of making n^3 API calls even if we take off some percentage for end-to-end over-the-wire latencies.
Calculations:
Total response time to get all holdings, items, open loans per instance(seconds) = time to get current instance + time to get all holdings
time to get all holdings = get all holdings per instance + time to get all items + time to get all loans
time to get all items = get current_item * total items
time to get all loans = get current_loan * total loans
Get an instance
GET inventory/instances/${instanceId}
...
For each instance, get all holdings
GET /holdings-storage/holdings?limit=2147483647&query=instanceId==${instanceId}
...
For each holding, get all items
GET /inventory/items?limit=2147483647&query=holdingsRecordId==${current_holding}
...
For each item, get all open loans
GET /circulation/loans?limit=2147483647&query=(itemId==${current_item} and status.name==Open)
...
JVM Profiling
- Overall slow methods (between the modules profiled: mod-rtac, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)
To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1592598484852&to=1592600441623&var-service_tag=mod-rtac&var-aggregation=10s
Database
The database CPU usage decreases slightly as the number of users increases 1, 5 users run. At maximum 28% CPU usage for 1 user. We are seeing a slight decrease in CPU utilization for 5 Users because RDS Postgresql is caching the most frequent query results.
...
The slowest query which took the most of execution time was initiated by the mod_inventory_storage service presented in the following table:
...
Percent of total time
...
Average Time, ms
...
Calls
...
Query
...
- Below is the dependency graph generated from Giraffe for single POST_/circulation/renew-by-barcode API. It shows individual calls made by renew-by-barcode to gather prerequisite data from different APIs such as item-storage, location-units, loan-types, request-storage, notice-policy, locations, and mod-authtoken. In all, this call took 990 ms, and 6% of that is consumed by mod-authtoken and the remaining 94% of the time is taken by other APIs to gather prerequisite data.
Database CPU Utilization
The database CPU usage increases gradually for 1, 2 users run. At maximum 9% CPU usage for 1 user.
For 2 users, max CPU utilization increases and then stabilizes as the number of loans increases. Therefore, CPU utilization is relatively on the lower side.
For 30 minutes | Loans | RDS CPU Utilization |
1 user | 10 | 7% |
50 | 8% | |
100 | 8% | |
200 | 9% | |
2 users | 10 | 14% |
50 | 15% | |
100 | 15% | |
200 | 15% |
Database queries
Anchor | ||||
---|---|---|---|---|
|
For 1 run of 30 minutes test, following queries were made. To point out, total calls made and the total time spend on these queries is significant.
Query | Total time | Average Time | Total calls made |
---|---|---|---|
SELECT count_estimate('SELECT jsonb,id FROM fs09000000_mod_inventory_storage. |
API response time vs Postgresql query response time
This difference below between API and Postgresql query response time is the sum of network time to/back from calling database, deserialization of JSON string into JSON objects, processing time, and network time to/from API. Below are a single API request and SQL query triggered manually for comparison.
...
CPU Utilization
item WHERE lower(f_unaccent(item.jsonb->>''barcode'')) LIKE lower(f_unaccent(''54357881''))') | 71% | 14 ms | 9027 |
SELECT count_estimate('SELECT jsonb FROM fs09000000_mod_patron_blocks.user_summary WHERE (jsonb->>''userId'') = ''4deba809-4fb4-432a-b3df-c54b6ad51d22''') | 10% | 2 ms | 8457 |
Service CPU Utilization
Service CPU utilization is relatively on a lower side for most modules except for mod-inventory-storagecirculation. For service with the highest CPU utilization is shown by range to get a better picture.
For 30 minutes | Modules | Service CPU Utilization |
1 user |
mod-inventory | 3% |
mod-inventory-storage | 10% |
mod-circulation |
70%-95% |
mod-circulation-storage |
6% | |
2 users | mod-inventory |
3. |
25% |
mod-inventory-storage |
17.75% | |
mod-circulation | 70%-122.75% |
mod- |
circulation-storage |
11. |
75% |
Service Memory Utilization
Memory Service memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.
...
for most modules except mod-circulation which was on the higher side. For service with the highest Memory utilization is shown by range to get a better picture.
For 30 minutes | Modules | Service Memory Utilization |
1 user | mod-inventory | 63% |
mod-inventory-storage | 56% | |
mod-circulation |
113%-118% |
mod-circulation-storage |
42% | |
2 users | mod-inventory |
65% |
mod-inventory-storage |
56% | |
mod-circulation | 113%-118% |
mod-circulation- |
storage | 49% |
Graph for service memory utilization for mod-circulation. Memory usage started at ~113%, rose to ~118% and fall back to ~101%
Recommended Improvements
Anchor | ||||
---|---|---|---|---|
|
- The following JIRA JIRAs has been created for mod-authtoken optimization:
Jira Jira Legacy server System JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODAT-80
Jira Legacy server System JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODAT-83
Appendix
See Attached rtacuser_loanRenewals-test-report.xlsx for details