Placing Holds Test Report (Fameflower)

PERF-27 - Getting issue details... STATUS

Overview

In this api, we are checking the performance of placing a holding on a discovery service interface (EDS) running in the Fameflower release - 

 We tested it with 1, 5, 8, and 20 virtual users for 30 minutes.  

Backend:

  • mod-patron-4.1.0
  • mod-circulation-18.0.9
  • mod-circulation-storage-11.0.0
  • mod-inventory-storage-19.1.2
  • mod-inventory-14.1.3
  • mod-feesfines-15.7.2
  • okapi-2.38.0
  • mod-authtoken-2.4.0
  • mod-permissions-5.9.0

Environment:

  • 55 back-end modules deployed in 110 ECS services
  • 3 okapi ECS services
  • 8 m5.large  EC2 instances
  •  db.r5.xlarge AWS RDS instance
  • INFO logging level

High Level Summary

  1. The placing a hold request takes about 1 seconds: POST /patron/account/{id}/instance/{instanceId}/hold
  2. POST /patron/account/{id}/instance/{instanceId}/hold and POST /circulation/requests/instances show almost the same response time. In addition, mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code. Seems like the main issue is the mod-circulation instance-level request API, POST /circulation/requests/instances which is called by mod-patron subsequently.
  3. JVM profiling shows the slowest operations, totalling more CPU time than other calls: 
    1. logging with INFO level - Logging level could be reduced to WARNING or ERROR, but at the cost of having less data to work with should there be a need to troubleshoot. The most CPU consuming methods are org.folio.circulation.support.Result.next and  org.folio.circulation.support.Result.after from org.folio.circulation.support.SingleRecordFetcher, anorg.folio.circulation.support.SingleRecordFetcher.fetch method
    2. JSON de/serialization - fasterxml.jackson.databind.ObjectMapper.readValue method of mod-circulation-storage and mod-inventory-storage consumed high CPU resources as there were a lot of JSON decoding, this implementation could be reviewed for optimisation possibilities. The most CPU consuming methods are org.folio.rest.persist.PgUtil.streamGet from org.folio.rest.impl.InstanceStorageAPI.getInstanceStorageInstancesByInstanceId (mod-inventory-storage), org.folio.rest.impl.HoldingsStorageAPI.getHoldingsStorageHoldings(mod-inventory-storage) and org.folio.rest.persist.PostgresClient.doGet from  org.folio.rest.impl.LoansAPI.getLoanStorageLoans  and org.folio.rest.impl.RequestsAPI.getRequestStorageRequests(mod-circulation-storage)
    3. request's validation - need to consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations. The slowest method is org.folio.rest.RestVerticle.route and org.folio.rest.RestVerticle.parseParams
  4. mod-circulation-storage and mod-feesfines generated missing indexes warnings, see Recommended Improvements for the JIRAs that were created/covered by this testing effort.

Test Runs

Test

Virtual Users

Duration

OKAPI log level

Profiled

Ramp up (total time

in seconds)

API call

1. 

1

30 min

WARNING

No

1

POST /patron/account/{id}/instance/{instanceId}/hold

2. 

5

30 min

WARNING

No

50

POST /patron/account/{id}/instance/{instanceId}/hold

3. 

8

30 min

WARNING

No

80

POST /patron/account/{id}/instance/{instanceId}/hold

4. 

20

30 min

WARNING

No

200

POST /patron/account/{id}/instance/{instanceId}/hold

5. 

20

30 min

WARNING

Yes

200

POST /patron/account/{id}/instance/{instanceId}/hold
6.130 minWARNINGNo1POST /circulation/requests/instances
7.530 minWARNINGNo50POST /circulation/requests/instances
8.830 minWARNINGNo80POST /circulation/requests/instances
9.2030 minWARNINGNo200POST /circulation/requests/instances
10.2030 minWARNINGYes200POST /circulation/requests/instances
11.130 minWARNINGYes1POST /circulation/requests
12.830 minWARNINGYes200POST /circulation/requests


Results

  1. An instance-level request to mod-patron 

    POST /patron/account/{id}/instance/{instanceId}/hold

    {
        "item": {
    	    "instanceId": "c2063f7c-245a-4c24-b643-67fea831ebcc"
    	},
        "requestDate": "2020-06-01T18:51:24.000+0000",
        "pickupLocationId": "7068e104-aa14-4f30-a8bf-71f71cc15e07"
    }

    Overall time for the placing holds api in seconds 


    Average 75th %tile
    1 user933.01 ms 1.10 s
    5 users1.01 s1.12 s
    8 users1.03 s1.18 s
    20 users1.33 s1.47 s
  2. mod-patron subsequently calls mod-circulation to place instance-level requests

     POST /circulation/requests/instances

    {
      "requestDate": "2017-07-29T22:25:37Z",
      "requesterId": "21932a85-bd00-446b-9565-46e0c1a5490b",
      "instanceId": "195efae1-588f-47bd-a181-13a2eb437701",
      "requestExpirationDate": "2017-07-25T22:25:37Z",
      "pickupServicePointId": "8359f2bc-b83e-48e1-8a5b-ca1a74e840de"
    }

    Overall time in seconds 


    Average 75th %tile
    1 user864.35 ms1.01 s
    5 users1.09 s1.14 s
    8 users1.15 s1.18 s
    20 users1.58 s1.53 s
  3.  An Item-level request 

    POST /circulation/requests

    {
      "requestDate": "2020-06-18T09:23:11.957Z",
      "requesterId": "e96618a9-04ee-4fea-aa60-306a8f4dd89b",
      "requestType": "Hold",
      "fulfilmentPreference": "Hold Shelf",
      "itemId": "0000023d-a95f-4396-8d37-42dda48cdf3d",
      "requestExpirationDate": "2020-06-18T09:23:11.957Z",
      "pickupServicePointId": "d80a641e-50b9-4bd2-99a2-a4a12c8515cb"
    }

    Overall time in seconds 


    Average 75th %tile
    1 user807.90 ms819.00 ms
    8 users855.92 ms855.00 ms

                    The comparison table between an instance-level request and an Item-level request to mod-circulation (done 23.07.2020). The item-level request is 24.5% faster than an instance-level request


An instance-level request to mod-circulation

POST /circulation/requests/instances

An Item-level request to mod-circulation

POST /circulation/requests 


Average 75th %tileAverage 75th %tile
1 user1.07 s1.09 s807.90 ms819.00 ms
8 users1.09 s1.09 s855.92 ms855.00 ms

JVM Profiling

  • Overall slow methods (between the modules profiled: mod-patron, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-patron&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s


  • Only slow mod-circulation methods:

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-circulation&var-aggregation=10s

When drilling down org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled, we get the following tree. To see more click here: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.apache.logging.log4j.spi.AbstractLogger&method=logIfEnabled&from=1591958944552&to=1591961253729

When drilling down com.fasterxml.jackson.databind.ObjectMapper.readValue, we get the following tree. http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=com.fasterxml.jackson.databind.ObjectMapper&method=readValue&from=1591958944552&to=1591961253729

When drilling down org.hibernate.validator.internal.engine.ValidatorImpl.validate, we get the following tree. To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.hibernate.validator.internal.engine.ValidatorImpl&method=validate&from=1591958944552&to=1591961253729


Slow mod-circulation-storage methods:



Can drill down these methods for more information at : http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591959015664&to=1591960971673&var-service_tag=mod-circulation-storage&var-aggregation=10s


Database

The database does not show much CPU usage for 1, 5, 8, 20 users run. At maximum 5%-6% CPU usage for the high case of 20 users.

There aren't any WARNING statements of missing indexes in mod-inventory-storage, mod-patron modules.

The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z'
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))

The following WARNING statements of missing indexes were generated during a test run and logged by mod-feesfines:

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 6977632d-9ad5-4c37-b067-1e92343c18b7 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('6977632d-9ad5-4c37-b067-1e92343c18b7'))


Database Slow queries 

The slowest query which took the most of execution time was initiated by the mod_circulation_storage service presented in the following table:


Percent of total time

Average Time,ms

Calls

Module

Query

7%25 ms 426mod_circulation_storage

SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan WHERE (lower(f_unaccent(loan.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Open'))) AND (loan.jsonb->>'userId' ~ '') LIMIT 5000 OFFSET 0


CPU Utilization


1 user5 users8 users20 users

Average %Range %Average %Range %Average %Range %Average %Range %
mod-circulation1.230-31.9832-2030-1460-88
mod-circulation-storage0.610.25-6.601.730.23-26.7720-630-9
mod-patron10-2422-6610-1200-11
mod-feesfines100101-102100101-103100101-103100101-103

Memory

Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent. 


1 user5 users8 users20 users

AverageAverageAverageAverage
mod-circulation68%69%70%76%
mod-circulation-storage31%32%32%32%
mod-patron49%52%52%52%
mod-inventory33%33%34%34%
mod-inventory-storage38%38%39%39%
mod-feesfines97%97%97%97%

Logging

With INFO log level, In a 30 minutes run,  there were more than 50K lines of log in mod-feesfines logs, 50K of them were the following warnings.  Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 01e1bcb8-56b6-466b-a677-bd8c3189b036 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('01e1bcb8-56b6-466b-a677-bd8c3189b036'))

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == a22b1b9e-d807-43b4-8d6c-455be0e66f8a >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('a22b1b9e-d807-43b4-8d6c-455be0e66f8a'))

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 88b1e963-f49a-4b6b-8641-a66ac7375891 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('88b1e963-f49a-4b6b-8641-a66ac7375891'))

There were more than 3K lines of log in mod-circulation-storage logs, 3K of them were the following warnings.

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z'
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration'))
WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date'))
WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))


Recommended Improvements 

  • In mod-circulation, mod-circulation-storage, mod-inventory-storage consider using a more efficient JSON package or calling use the existing jackson serialization calls in a different way to address the item: JVM profiling shows JSON de/serialization operations one of the slowest operations.
  • In mod-circulation consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations.
  • The following JIRAs have been created for mod-circulation-storage and mod-fees-fines:
  • CIRCSTORE-216 - Getting issue details... STATUS
  • MODFEE-67 - Getting issue details... STATUS
  • Some missing indexes warnings are covered by the story CIRCSTORE-215 - Getting issue details... STATUS
  • Consider logging with ERROR level if not fixing the JIRA below to reduce the excess logging by mod-circulation-storage and mod-feesfines
  • It will great for the modules to have custom metrics, such as rates and response times of out-going API and database calls so that we can get more insight on how an API behaves.

Discussions

We could not get to the bottom of instance-level requests slowness. That why we needed to look into item-level request. mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code.

Appendix

See Attached place-holds-FF.xlsx for details