Overview

In this workflow we are checking the performance of the check-in-check-out workflow running in the Fameflower release ( PERF-9 - Getting issue details... STATUS ). We will test it with 1, 5, 8, and 20 virtual users for 30 minutes. A longevity test will also be executed to see if there were memory issues.

Backend:

mod-circulation-18.0.9
mod-circulation-storage-11.0.8
mod-inventory-storage-19.1.2
mod-inventory-14.1.3
okapi-2.38.0

Frontend:

folio_circulation-2.0.0
Item Check-in (folio_checkin-2.0.1)
Item Check-out (folio_checkout-3.0.2)

Environment:

55 back-end modules deployed in 110 ECS services
3 okapi ECS services
8 m5.large EC2 instances
1 db.r5.xlarge AWS RDS instance
INFO logging level

High Level Summary

Overall check in, check out time in seconds
1. Average check in time is 1.51 seconds for a typical use case of 8 users, 1.65 seconds for 20 users
2. Average check out time is 1.75 seconds for a typical use case of 8 users, 1.90 seconds for 20 users
Slow APIs taking more than 100ms to run
1. POST checkout-by-barcode
2. POST checkin-by-barcode
3. Get circulation/loans
4. Get inventory/items
mod-circulation-storage log warnings for missing indexes - 64K lines in 45 minutes run. Logging level could be reduced to WARNING or INFO, but at the cost of having less data to work with should there be a need to troubleshoot. Adding the missing indexes could improve performance while stop logging these warnings CIRCSTORE-215 - Getting issue details... STATUS
JVM profiling shows JSON de/serialization operations one of the slowest operations, totaling more CPU time than other calls. Since FOLIO modules retrieve and stores JSON objects, making sure that serializing and deserializing JSON efficient is essential, see Recommended Improvements

Test Runs

Test	Virtual Users	Duration	OKAPI log level	OKAPI Version	Profiled
1.	1	30 min	INFO	2.38.0	No
2.	5	30 min	INFO	2.38.0	No
3.	8	30 min	INFO	2.38.0	No
4.	20	20 min	INFO	2.38.0	No
5.	8	45 min	INFO	2.38.0	Yes

Results

Response times

	Average (seconds)		50th %tile (seconds)		75th %tile (seconds)		95th %tile (seconds)
	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out
1 user	1.015	1.234	0.96	1.277	1.071	1.409	1.322	1.653
5 users	1.236	1.488	1.156	1.393	1.464	1.869	1.704	2.219
8 users	1.512	1.751	1.403	1.852	1.741	2.031	2.02	2.274
20 users	1.649	1.898	1.535	1.996	1.896	2.211	2.252	2.539

Slow APIs taking more than 100 ms to return

API	1 user (75th %tile)	5 users (75th %tile)	8 users (75th %tile)	20 Users (75th %tile)
POST checkout-by-barcode	615 ms	905 ms	906 ms	988 ms
POST checkin-by-barcode	548 ms	830 ms	1053 ms	1137 ms
Get circulation/loans	283 ms	346 ms	449 ms	479 ms
Get inventory/items	217 ms	232 ms	237 ms	281 ms

JVM Profiling

Overall slow methods (between the modules profiled: okapi, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)
- To see more drill down these methods at: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940345995&to=1589943180494&var-service_tag=okapi&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s

Only slow Okapi methods:

When drilling down org.folio.okapi.managers.ModuleManager.getEnabledModules, we get the following tree. To see more click here: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/U9JtDPLWz/stacktrace?orgId=1&class=org.folio.okapi.managers.ModuleManager&method=getEnabledModules&from=1589940304610&to=1589943246772

Slow mod-circulation methods:

- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1590418466176&to=1590420349447&var-service_tag=mod-inventory&var-aggregation=10s
  - Areas of code that serializes and deserializes JSON objects
    - org.folio.inventory.storage.external.ExternalStorageModuleCollection
    - org.forlio.inventory.resources.Items$$Lambdas...apply()
    - org.folio.okapi.util.LockedStringMap.lambda$getString()
    - org.folio.okapi.util.LockedTypedMap.lambda...handle()
    - org.folio.okapi.managers..proxy
    - org.folio.cql2pgjson.CQL2PgJSON.loadDbSchema
    - org.folio.rest.persist.PostgresClient.deserializeRow
  - Logging
    - org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled
    - org.apache.logging.slf4j.Log4jLogger.info
  - org.joda.time.DateTime.parse
    - org.folio.circulation.domain.Loan.getReturnDate
    - org.folio.circulation.domain.Loan.getDueDate

Database

Database does not show much CPU usage for 1, 5, 8 and 20 users runs. At maximum only 25% CPU usage for the high case of 20 users.

The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z'
WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery
WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery
WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery
WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d'))
WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'actionType', CQL >>> SQL: actionType == Check-out >>> lower(f_unaccent(patron_action_session.jsonb->>'actionType')) LIKE lower(f_unaccent('Check-out'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false'))
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z'
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-05-20T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T00:00:00.000Z'

CPU Utilization

	1 user		5 users		8 users		20 users
	Average %	Range %	Average %	Range %	Average %	Range %	Average %	Range %
Okapi	1.86	0.32-6.85	5.95	1.16 - 16.55	7.92	1.41-16.07	17.52	2.36-29.85
mod-inventory	0.40	0.28 - 3.01	0.76	0.32-8.36	0.62	0.8-2.17	1	0.25-3
mod-inventory-storage	0.94	0.328 - 4.09	2.88	0.36-9.11	3.33	1.84-19.72	7.96	1.88-14.67

Memory

Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.

	1 user	5 users	8 users	20 users
	Average	Average	Average	Average
Okapi	50%	50%	46%	46%
mod-circulation	70%	62%	70%	71%
mod-circulation-storage	30%	30%	31%	31%
mod-inventory	38%	38%	38%	38%
mod-inventory-storage	41%	41%	41%	41%

Logging

With INFO log level, In a 45 minutes run, there were more than 66K lines of log in mod-circulation-storage logs, 64K of them were the following warnings. Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z'
WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery
WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery
WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery
WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d'))
WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21'))

Recommended Improvements

In mod-circulation and okapi consider using a more efficient JSON package or calling use the existing jackson serialization calls in a different way to address the item: JVM profiling shows JSON de/serialization operations one of the slowest operations.
In mod-circulation consider using a more efficient date-time package instead of joda time because it's one of the slowest operations.
Consider logging with ERROR level if not fixing the JIRA below to reduce the excess logging by mod-circulation-storage

CIRCSTORE-215 - Getting issue details... STATUS

Have follow-up stories to study the performance of the four APIs that are still taking over 100ms to return to see where performance could improve.

Appendix

For more raw data of the test runs please see the attached check-in-check-out-FF-UChicago.xlsx.

Folio Development Teams

Check-in-check-out Test Report (Fameflower)