PERF-27 - Getting issue details... STATUS

Overview

In this api, we are checking the performance of placing a holding on a discovery service interface (EDS) running in the Fameflower release -

We tested it with 1, 5, 8, and 20 virtual users for 30 minutes.

Backend:

mod-patron-4.1.0
mod-circulation-18.0.9
mod-circulation-storage-11.0.0
mod-inventory-storage-19.1.2
mod-inventory-14.1.3
mod-feesfines-15.7.2
okapi-2.38.0
mod-authtoken-2.4.0
mod-permissions-5.9.0

Environment:

55 back-end modules deployed in 110 ECS services
3 okapi ECS services
8 m5.large EC2 instances
1 db.r5.xlarge AWS RDS instance
INFO logging level

High Level Summary

The placing a hold request takes about 1 seconds: POST /patron/account/{id}/instance/{instanceId}/hold.
POST /patron/account/{id}/instance/{instanceId}/hold and POST /circulation/requests/instances show almost the same response time. In addition, mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code. Seems like the main issue is the mod-circulation instance-level request API, POST /circulation/requests/instances which is called by mod-patron subsequently.
JVM profiling shows the slowest operations, totalling more CPU time than other calls:
1. logging with INFO level - Logging level could be reduced to WARNING or ERROR, but at the cost of having less data to work with should there be a need to troubleshoot. The most CPU consuming methods are org.folio.circulation.support.Result.next and org.folio.circulation.support.Result.after from org.folio.circulation.support.SingleRecordFetcher, and org.folio.circulation.support.SingleRecordFetcher.fetch method
2. JSON de/serialization - fasterxml.jackson.databind.ObjectMapper.readValue method of mod-circulation-storage and mod-inventory-storage consumed high CPU resources as there were a lot of JSON decoding, this implementation could be reviewed for optimisation possibilities. The most CPU consuming methods are org.folio.rest.persist.PgUtil.streamGet from org.folio.rest.impl.InstanceStorageAPI.getInstanceStorageInstancesByInstanceId (mod-inventory-storage), org.folio.rest.impl.HoldingsStorageAPI.getHoldingsStorageHoldings(mod-inventory-storage) and org.folio.rest.persist.PostgresClient.doGet from org.folio.rest.impl.LoansAPI.getLoanStorageLoans and org.folio.rest.impl.RequestsAPI.getRequestStorageRequests(mod-circulation-storage)
3. request's validation - need to consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations. The slowest method is org.folio.rest.RestVerticle.route and org.folio.rest.RestVerticle.parseParams
mod-circulation-storage and mod-feesfines generated missing indexes warnings, see Recommended Improvements for the JIRAs that were created/covered by this testing effort.

Test Runs

Test	Virtual Users	Duration	OKAPI log level	Profiled	Ramp up (total time in seconds)	API call
1.	1	30 min	WARNING	No	1	POST /patron/account/{id}/instance/{instanceId}/hold
2.	5	30 min	WARNING	No	50	POST /patron/account/{id}/instance/{instanceId}/hold
3.	8	30 min	WARNING	No	80	POST /patron/account/{id}/instance/{instanceId}/hold
4.	20	30 min	WARNING	No	200	POST /patron/account/{id}/instance/{instanceId}/hold
5.	20	30 min	WARNING	Yes	200	POST /patron/account/{id}/instance/{instanceId}/hold
6.	1	30 min	WARNING	No	1	POST /circulation/requests/instances
7.	5	30 min	WARNING	No	50	POST /circulation/requests/instances
8.	8	30 min	WARNING	No	80	POST /circulation/requests/instances
9.	20	30 min	WARNING	No	200	POST /circulation/requests/instances
10.	20	30 min	WARNING	Yes	200	POST /circulation/requests/instances
11.	1	30 min	WARNING	Yes	1	POST /circulation/requests
12.	8	30 min	WARNING	Yes	200	POST /circulation/requests

Results

An instance-level request to mod-patron
POST /patron/account/{id}/instance/{instanceId}/hold
```
{
    "item": {
	    "instanceId": "c2063f7c-245a-4c24-b643-67fea831ebcc"
	},
    "requestDate": "2020-06-01T18:51:24.000+0000",
    "pickupLocationId": "7068e104-aa14-4f30-a8bf-71f71cc15e07"
}
```
Overall time for the placing holds api in seconds

Average 75th %tile
1 user 933.01 ms 1.10 s
5 users 1.01 s 1.12 s
8 users 1.03 s 1.18 s
20 users 1.33 s 1.47 s

mod-patron subsequently calls mod-circulation to place instance-level requests

POST /circulation/requests/instances

{
  "requestDate": "2017-07-29T22:25:37Z",
  "requesterId": "21932a85-bd00-446b-9565-46e0c1a5490b",
  "instanceId": "195efae1-588f-47bd-a181-13a2eb437701",
  "requestExpirationDate": "2017-07-25T22:25:37Z",
  "pickupServicePointId": "8359f2bc-b83e-48e1-8a5b-ca1a74e840de"
}

Overall time in seconds

	Average	75th %tile
1 user	864.35 ms	1.01 s
5 users	1.09 s	1.14 s
8 users	1.15 s	1.18 s
20 users	1.58 s	1.53 s

An Item-level request

POST /circulation/requests

{
  "requestDate": "2020-06-18T09:23:11.957Z",
  "requesterId": "e96618a9-04ee-4fea-aa60-306a8f4dd89b",
  "requestType": "Hold",
  "fulfilmentPreference": "Hold Shelf",
  "itemId": "0000023d-a95f-4396-8d37-42dda48cdf3d",
  "requestExpirationDate": "2020-06-18T09:23:11.957Z",
  "pickupServicePointId": "d80a641e-50b9-4bd2-99a2-a4a12c8515cb"
}

Overall time in seconds

	Average	75th %tile
1 user	807.90 ms	819.00 ms
8 users	855.92 ms	855.00 ms

The comparison table between an instance-level request and an Item-level request to mod-circulation (done 23.07.2020). The item-level request is 24.5% faster than an instance-level request

	An instance-level request to mod-circulation POST /circulation/requests/instances		An Item-level request to mod-circulation POST /circulation/requests
	Average	75th %tile	Average	75th %tile
1 user	1.07 s	1.09 s	807.90 ms	819.00 ms
8 users	1.09 s	1.09 s	855.92 ms	855.00 ms

JVM Profiling

Overall slow methods (between the modules profiled: mod-patron, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-patron&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s

Only slow mod-circulation methods:

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-circulation&var-aggregation=10s

When drilling down org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled, we get the following tree. To see more click here: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.apache.logging.log4j.spi.AbstractLogger&method=logIfEnabled&from=1591958944552&to=1591961253729

When drilling down com.fasterxml.jackson.databind.ObjectMapper.readValue, we get the following tree. http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=com.fasterxml.jackson.databind.ObjectMapper&method=readValue&from=1591958944552&to=1591961253729

When drilling down org.hibernate.validator.internal.engine.ValidatorImpl.validate, we get the following tree. To see more drill down these met hods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.hibernate.validator.internal.engine.ValidatorImpl&method=validate&from=1591958944552&to=1591961253729

Slow mod-circulation-storage methods:

Can drill down these methods for more information at : http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591959015664&to=1591960971673&var-service_tag=mod-circulation-storage&var-aggregation=10s

Database

The database does not show much CPU usage for 1, 5, 8, 20 users run. At maximum 5%-6% CPU usage for the high case of 20 users.

There aren't any WARNING statements of missing indexes in mod-inventory-storage, mod-patron modules.

The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z'
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))

The following WARNING statements of missing indexes were generated during a test run and logged by mod-feesfines:

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 6977632d-9ad5-4c37-b067-1e92343c18b7 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('6977632d-9ad5-4c37-b067-1e92343c18b7'))

Database Slow queries

The slowest query which took the most of execution time was initiated by the mod_circulation_storage service presented in the following table:

Percent of total time	Average Time,ms	Calls	Module	Query
7%	25 ms	426	mod_circulation_storage	SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan WHERE (lower(f_unaccent(loan.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Open'))) AND (loan.jsonb->>'userId' ~ '') LIMIT 5000 OFFSET 0

CPU Utilization

	1 user		5 users		8 users		20 users
	Average %	Range %	Average %	Range %	Average %	Range %	Average %	Range %
mod-circulation	1.23	0-31.98	3	2-20	3	0-14	6	0-88
mod-circulation-storage	0.61	0.25-6.60	1.73	0.23-26.77	2	0-6	3	0-9
mod-patron	1	0-24	2	2-66	1	0-12	0	0-11
mod-feesfines	100	101-102	100	101-103	100	101-103	100	101-103

Memory

Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.

	1 user	5 users	8 users	20 users
	Average	Average	Average	Average
mod-circulation	68%	69%	70%	76%
mod-circulation-storage	31%	32%	32%	32%
mod-patron	49%	52%	52%	52%
mod-inventory	33%	33%	34%	34%
mod-inventory-storage	38%	38%	39%	39%
mod-feesfines	97%	97%	97%	97%

Logging

With INFO log level, In a 30 minutes run, there were more than 50K lines of log in mod-feesfines logs, 50K of them were the following warnings. Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 01e1bcb8-56b6-466b-a677-bd8c3189b036 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('01e1bcb8-56b6-466b-a677-bd8c3189b036'))

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == a22b1b9e-d807-43b4-8d6c-455be0e66f8a >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('a22b1b9e-d807-43b4-8d6c-455be0e66f8a'))

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 88b1e963-f49a-4b6b-8641-a66ac7375891 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('88b1e963-f49a-4b6b-8641-a66ac7375891'))

There were more than 3K lines of log in mod-circulation-storage logs, 3K of them were the following warnings.

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z'
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))

Recommended Improvements

In mod-circulation, mod-circulation-storage, mod-inventory-storage consider using a more efficient JSON package or calling use the existing jackson serialization calls in a different way to address the item: JVM profiling shows JSON de/serialization operations one of the slowest operations.
In mod-circulation consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations.
The following JIRAs have been created for mod-circulation-storage and mod-fees-fines:
CIRCSTORE-216 - Getting issue details... STATUS
MODFEE-67 - Getting issue details... STATUS
Some missing indexes warnings are covered by the story CIRCSTORE-215 - Getting issue details... STATUS
Consider logging with ERROR level if not fixing the JIRA below to reduce the excess logging by mod-circulation-storage and mod-feesfines
It will great for the modules to have custom metrics, such as rates and response times of out-going API and database calls so that we can get more insight on how an API behaves.

Discussions

We could not get to the bottom of instance-level requests slowness. That why we needed to look into item-level request. mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code.

Appendix

See Attached place-holds-FF.xlsx for details

Folio Development Teams

Placing Holds Test Report (Fameflower)