WORK IN PROGRESS

PERF-39 - Getting issue details... STATUS

Overview

In this workflow, we are checking the performance of the renewal of loan items for the Goldenrod release.

We tested it with 1, 2 virtual users for 30 minutes.

Backend:

mod-users-17.1.0
mod-users-bl-6.0.0
mod-authtoken-2.5.1
mod-permissions-5.11.2
mod-circulation-19.0.9
mod-circulation-storage-12.0.1
okapi-3.1.1

Environment:

55 back-end modules deployed in 110 ECS services
3 okapi ECS services
8 m5.large EC2 instances
1 db.r5.xlarge AWS RDS instance
INFO logging level

High Level Summary

On average POST_/circulation/renew-by-barcode API call takes average 671.12 seconds as we gradually increase the number of loans and users to 200 and 2 respectively.
POST_/circulation/renew-by-barcode API makes 26 different API requests to gather userBarcode and itemBarcode information. 34% of those are calls are to mod-authtoken. See Recommended Improvements for the JIRAs that were covered by this testing effort.

Test Runs

Test	Virtual Users	Duration	OKAPI log level	Profiled	Loans per User
1.	1	30 min	WARNING	No	10, 50, 100, 200
2.	2	30 min	WARNING	No	10, 50, 100, 200
3.	1	30 min	INFO	No	10, 50, 100, 200

Results

Renew all loans by barcode

For 30 minutes	Loans	Average	Average POST_/circulation/renew-by-barcode
1 user	10	6.52 seconds	585.11 ms
	50	31.81 seconds	637.48 ms
	100	1.06 minutes	650.15 ms
	200	2.09 minutes	673.40 ms
2 users	10	7.42 seconds	683.76 ms
	50	32.61 seconds	686.65 ms
	100	1.10 minutes	705.37 ms
	200	2.27 minutes	747.04 ms

Because there are no custom metrics inside mod-rtac to expose the wait time of each API call, we needed to mimic the rtac API logic by making subsequent calls to mod-inventory, mod-circulation to get open loans for greater than 300 holdings per instance in the JMeter script. One caveat is that these calls are synchronous and are end-to-end (over the internet from JMeter to FOLIO) so naturally these API calls will be slower compared to the actual /rtac API calls from mod-rtac which were implemented asynchronously. Therefore, the response times while are not likely what is seen by the code but should still give us a good idea of the cost of making n^3 API calls even if we take off some percentage for end-to-end over-the-wire latencies.

Calculations:

Total response time to get all holdings, items, open loans per instance(seconds) = time to get current instance + time to get all holdings

time to get all holdings = get all holdings per instance + time to get all items + time to get all loans

time to get all items = get current_item * total items

time to get all loans = get current_loan * total loans

Get an instance

GET inventory/instances/${instanceId}

	Profiler	current_instance + avg(get all holdings) = Total average	current_instance + avg(get all holdings) = Total average 75th %tile
1 user	ON	143 ms + 54.5 sec = 54.7 sec	131 ms + 54.7 sec = 54.9 sec
1 user	OFF	118 ms + 48.4 sec = 48.6 sec	108 ms + 52.6 sec = 52.7 sec
5 users	ON	129 ms + 67.2 sec = 67.3 sec	137 ms + 70.1 sec = 70.2 sec
5 users	OFF	102 ms + 57.8 sec = 57.9 sec	112 ms + 58.2 sec = 58.3 sec

For each instance, get all holdings

GET /holdings-storage/holdings?limit=2147483647&query=instanceId==${instanceId}

	Profiler	avg(get all holdings per instance) + avg(get all items) + avg(get all loans) = Total average	avg(get all holdings per instance) + avg(get all items) + avg(get all loans) = Total average 75th %tile
1 user	ON	174 ms + 39 sec + 15.3 sec = 54.5 sec	213 ms + 40.2 sec + 14.3 sec = 54.7 sec
1 user	OFF	73 ms + 34 sec + 14.3 sec = 48.4 sec	86 ms + 40.2 sec + 12.3 sec = 52.6 sec
5 users	ON	179 ms + 49.3 sec + 17.7 sec = 67.2 sec	223 ms + 51.6 sec + 18 sec = 70.1 sec
5 users	OFF	71 ms + 42.9 sec + 14.9 sec = 57.8	40 ms + 42.6 sec + 15.6 sec = 58.2 sec

For each holding, get all items

GET /inventory/items?limit=2147483647&query=holdingsRecordId==${current_holding}

	Profiler	*avg(get current_item) total items = Total average**	*avg(get current_item) total items = Total average 75th %tile**
1 user	ON	119 ms * 333 = 39 sec	121 ms * 333 = 40.2 sec
1 user	OFF	104 ms * 333 = 34 sec	106 ms * 333 = 35.2 sec
5 users	ON	148 ms * 333 = 49.3 sec	155 ms * 333 = 51.6 sec
5 users	OFF	129 ms * 333 = 42.9 sec	128 ms * 333 = 42.6 sec

For each item, get all open loans

GET /circulation/loans?limit=2147483647&query=(itemId==${current_item} and status.name==Open)

	Profiler	*avg(get current_loan) total loans = Total average**	*avg(get current_loan) total loans = Total average 75th %tile**
1 user	ON	46 ms * 333 = 15.3 sec	43 ms * 333 = 14.3 sec
1 user	OFF	43 ms * 333 = 14.3 sec	37 ms * 333 = 12.3 sec
5 users	ON	53 ms * 333 = 17.7 sec	54 ms * 333 = 18 sec
5 users	OFF	45 ms * 333 = 14.9 sec	47 ms * 333 = 15.6 sec

JVM Profiling

Overall slow methods (between the modules profiled: mod-rtac, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1592598484852&to=1592600441623&var-service_tag=mod-rtac&var-aggregation=10s

Database

The database CPU usage decreases slightly as the number of users increases 1, 5 users run. At maximum 28% CPU usage for 1 user. We are seeing a slight decrease in CPU utilization for 5 Users because RDS Postgresql is caching the most frequent query results.

Database Slow queries

The slowest query which took the most of execution time was initiated by the mod_inventory_storage service presented in the following table:

Percent of total time	Average Time, ms	Calls	Module	Query
77%	14,632 ms	21	mod_inventory_storage	SELECT COUNT(*) FROM fs09000000_mod_inventory_storage.holdings_record

API response time vs Postgresql query response time

This difference below between API and Postgresql query response time is the sum of network time to/back from calling database, deserialization of JSON string into JSON objects, processing time, and network time to/from API. Below are a single API request and SQL query triggered manually for comparison.

Action	Type of Request	Actual request	Time, ms
get Instance	SQL query	select * from fs09000000_mod_inventory_storage.instance where id = '00009a0b-e973-443b-9796-c3373a352f86';	305
	API request	https:/{okapi-url}/inventory/instances/00009a0b-e973-443b-9796-c3373a352f86	368
get holdings	SQL query	select * from fs09000000_mod_inventory_storage.holdings_record hr1 where hr1.instanceid = '121063ba-43cc-48f4-a49f-aaf5488127e0';	350
	API request	https:/{okapi-url}/holdings-storage/holdings?limit=2147483647&query=instanceId==121063ba-43cc-48f4-a49f-aaf5488127e	503
get Item	SQL query	select * from fs09000000_mod_inventory_storage.item where holdingsrecordid='95a897e5-2505-4803-a7f1-7da39c9f67cb';	353
	API request	https://{okapi-url}/inventory/items?limit=2147483647&query=holdingsRecordId%3D%3D95a897e5-2505-4803-a7f1-7da39c9f67cb	662
get Open loan	SQL query	select * from fs09000000_mod_circulation_storage.loan where jsonb->>'itemId'='3faad467-fca1-41c8-ad7d-af1dcd3fdc3c' AND jsonb->'status'->>'name'='Open';	372
	API request	https://{okapi-url}/circulation/loans?limit=2147483647&query=(itemId==3faad467-fca1-41c8-ad7d-af1dcd3fdc3c and status.name==Open)	400

CPU Utilization

CPU utilization is relatively on a lower side for most modules except for mod-inventory-storage

	1 user	5 users
	Average %	Average %
mod-circulation	3.14	3.95
mod-circulation-storage	4.84	6.38
mod-inventory	5.38	6.54
mod-inventory-storage	26	24
mod-rtac	2.45	3

Memory

Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.

	1 user	5 users
	Average %	Average %
mod-circulation	72.1	72.50
mod-circulation-storage	39.46	39.55
mod-inventory	44	46.5
mod-inventory-storage	44.30	45
mod-rtac	55.2	55

Recommended Improvements

The following JIRA has been created for mod-authtoken optimization:

MODAT-80 - Getting issue details... STATUS

Appendix

See Attached rtac-test-report.xlsx for details

PTF - Renew loan items Workflow Test Report (Goldenrod)