Placing Holds Test Report (Fameflower)

Placing Holds Test Report (Fameflower)

PERF-27: Investigation | Placing Holds algorithm/designClosed

Overview

In this api, we are checking the performance of placing a holding on a discovery service interface (EDS) running in the Fameflower release - 

 We tested it with 1, 5, 8, and 20 virtual users for 30 minutes.  

Backend:

  • mod-patron-4.1.0

  • mod-circulation-18.0.9

  • mod-circulation-storage-11.0.0

  • mod-inventory-storage-19.1.2

  • mod-inventory-14.1.3

  • mod-feesfines-15.7.2

  • okapi-2.38.0

  • mod-authtoken-2.4.0

  • mod-permissions-5.9.0

Environment:

  • 55 back-end modules deployed in 110 ECS services

  • 3 okapi ECS services

  • 8 m5.large  EC2 instances

  •  db.r5.xlarge AWS RDS instance

  • INFO logging level

High Level Summary

  1. The placing a hold request takes about 1 seconds: POST /patron/account/{id}/instance/{instanceId}/hold

  2. POST /patron/account/{id}/instance/{instanceId}/hold and POST /circulation/requests/instances show almost the same response time. In addition, mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code. Seems like the main issue is the mod-circulation instance-level request API, POST /circulation/requests/instances which is called by mod-patron subsequently.

  3. JVM profiling shows the slowest operations, totalling more CPU time than other calls: 

    1. logging with INFO level - Logging level could be reduced to WARNING or ERROR, but at the cost of having less data to work with should there be a need to troubleshoot. The most CPU consuming methods are org.folio.circulation.support.Result.next and  org.folio.circulation.support.Result.after from org.folio.circulation.support.SingleRecordFetcher, anorg.folio.circulation.support.SingleRecordFetcher.fetch method

    2. JSON de/serialization - fasterxml.jackson.databind.ObjectMapper.readValue method of mod-circulation-storage and mod-inventory-storage consumed high CPU resources as there were a lot of JSON decoding, this implementation could be reviewed for optimisation possibilities. The most CPU consuming methods are org.folio.rest.persist.PgUtil.streamGet from org.folio.rest.impl.InstanceStorageAPI.getInstanceStorageInstancesByInstanceId (mod-inventory-storage), org.folio.rest.impl.HoldingsStorageAPI.getHoldingsStorageHoldings(mod-inventory-storage) and org.folio.rest.persist.PostgresClient.doGet from  org.folio.rest.impl.LoansAPI.getLoanStorageLoans  and org.folio.rest.impl.RequestsAPI.getRequestStorageRequests(mod-circulation-storage)

    3. request's validation - need to consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations. The slowest method is org.folio.rest.RestVerticle.route and org.folio.rest.RestVerticle.parseParams

  4. mod-circulation-storage and mod-feesfines generated missing indexes warnings, see Recommended Improvements for the JIRAs that were created/covered by this testing effort.

Test Runs

Test

Virtual Users

Duration

OKAPI log level

Profiled

Ramp up (total time

in seconds)

API call

1. 

1

30 min

WARNING

No

1

POST /patron/account/{id}/instance/{instanceId}/hold

2. 

5

30 min

WARNING

No

50

POST /patron/account/{id}/instance/{instanceId}/hold

3. 

8

30 min

WARNING

No

80

POST /patron/account/{id}/instance/{instanceId}/hold

4. 

20

30 min

WARNING

No

200

POST /patron/account/{id}/instance/{instanceId}/hold

5. 

20

30 min

WARNING

Yes

200

POST /patron/account/{id}/instance/{instanceId}/hold

6.

1

30 min

WARNING

No

1

POST /circulation/requests/instances

7.

5

30 min

WARNING

No

50

POST /circulation/requests/instances

8.

8

30 min

WARNING

No

80

POST /circulation/requests/instances

9.

20

30 min

WARNING

No

200

POST /circulation/requests/instances

10.

20

30 min

WARNING

Yes

200

POST /circulation/requests/instances

11.

1

30 min

WARNING

Yes

1

POST /circulation/requests

12.

8

30 min

WARNING

Yes

200

POST /circulation/requests

 

Results

  1. An instance-level request to mod-patron 

    POST /patron/account/{id}/instance/{instanceId}/hold

    { "item": { "instanceId": "c2063f7c-245a-4c24-b643-67fea831ebcc" }, "requestDate": "2020-06-01T18:51:24.000+0000", "pickupLocationId": "7068e104-aa14-4f30-a8bf-71f71cc15e07" }

    Overall time for the placing holds api in seconds 

  2. mod-patron subsequently calls mod-circulation to place instance-level requests

     POST /circulation/requests/instances

    { "requestDate": "2017-07-29T22:25:37Z", "requesterId": "21932a85-bd00-446b-9565-46e0c1a5490b", "instanceId": "195efae1-588f-47bd-a181-13a2eb437701", "requestExpirationDate": "2017-07-25T22:25:37Z", "pickupServicePointId": "8359f2bc-b83e-48e1-8a5b-ca1a74e840de" }

    Overall time in seconds 

  3.  An Item-level request 

    POST /circulation/requests

    { "requestDate": "2020-06-18T09:23:11.957Z", "requesterId": "e96618a9-04ee-4fea-aa60-306a8f4dd89b", "requestType": "Hold", "fulfilmentPreference": "Hold Shelf", "itemId": "0000023d-a95f-4396-8d37-42dda48cdf3d", "requestExpirationDate": "2020-06-18T09:23:11.957Z", "pickupServicePointId": "d80a641e-50b9-4bd2-99a2-a4a12c8515cb" }

    Overall time in seconds 

                    The comparison table between an instance-level request and an Item-level request to mod-circulation (done 23.07.2020). The item-level request is 24.5% faster than an instance-level request

 

An instance-level request to mod-circulation

POST /circulation/requests/instances

An Item-level request to mod-circulation

POST /circulation/requests 

 

Average 

75th %tile

Average 

75th %tile

1 user

1.07 s

1.09 s

807.90 ms

819.00 ms

8 users

1.09 s

1.09 s

855.92 ms

855.00 ms

JVM Profiling

  • Overall slow methods (between the modules profiled: mod-patron, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-patron&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s

 

  • Only slow mod-circulation methods:

To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-circulation&var-aggregation=10s

When drilling down org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled, we get the following tree. To see more click here: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.apache.logging.log4j.spi.AbstractLogger&method=logIfEnabled&from=1591958944552&to=1591961253729

When drilling down com.fasterxml.jackson.databind.ObjectMapper.readValue, we get the following tree. http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=com.fasterxml.jackson.databind.ObjectMapper&method=readValue&from=1591958944552&to=1591961253729

When drilling down org.hibernate.validator.internal.engine.ValidatorImpl.validate, we get the following tree. To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.hibernate.validator.internal.engine.ValidatorImpl&method=validate&from=1591958944552&to=1591961253729

 

Slow mod-circulation-storage methods:

 

 

Can drill down these methods for more information at : http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591959015664&to=1591960971673&var-service_tag=mod-circulation-storage&var-aggregation=10s

 

Database

The database does not show much CPU usage for 1, 5, 8, 20 users run. At maximum 5%-6% CPU usage for the high case of 20 users.

There aren't any WARNING statements of missing indexes in mod-inventory-storage, mod-patron modules.

The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z' WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))

The following WARNING statements of missing indexes were generated during a test run and logged by mod-feesfines:

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 6977632d-9ad5-4c37-b067-1e92343c18b7 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('6977632d-9ad5-4c37-b067-1e92343c18b7'))

 

Database Slow queries 

The slowest query which took the most of execution time was initiated by the mod_circulation_storage service presented in the following table:

 

Percent of total time

Average Time,ms

Calls

Module

Query

7%

25 ms 

426

mod_circulation_storage

SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan WHERE (lower(f_unaccent(loan.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Open'))) AND (loan.jsonb->>'userId' ~ '') LIMIT 5000 OFFSET 0

 

CPU Utilization

 

1 user

5 users

8 users

20 users

 

1 user

5 users

8 users

20 users

 

Average %

Range %

Average %

Range %

Average %

Range %

Average %

Range %

mod-circulation

1.23

0-31.98

3

2-20

3

0-14

6

0-88

mod-circulation-storage

0.61

0.25-6.60

1.73

0.23-26.77

2

0-6

3

0-9

mod-patron

1

0-24

2

2-66

1

0-12

0

0-11

mod-feesfines

100

101-102

100

101-103

100

101-103

100

101-103

Memory

Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent. 

 

1 user

5 users

8 users

20 users

 

1 user

5 users

8 users

20 users

 

Average

Average

Average

Average

mod-circulation

68%

69%

70%

76%

mod-circulation-storage

31%

32%

32%

32%

mod-patron

49%

52%

52%

52%

mod-inventory

33%

33%

34%

34%

mod-inventory-storage

38%

38%

39%

39%

mod-feesfines

97%

97%

97%

97%

Logging

With INFO log level, In a 30 minutes run,  there were more than 50K lines of log in mod-feesfines logs, 50K of them were the following warnings.  Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.

WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 01e1bcb8-56b6-466b-a677-bd8c3189b036 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('01e1bcb8-56b6-466b-a677-bd8c3189b036')) WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == a22b1b9e-d807-43b4-8d6c-455be0e66f8a >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('a22b1b9e-d807-43b4-8d6c-455be0e66f8a')) WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 88b1e963-f49a-4b6b-8641-a66ac7375891 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('88b1e963-f49a-4b6b-8641-a66ac7375891'))

There were more than 3K lines of log in mod-circulation-storage logs, 3K of them were the following warnings.

WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z' WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))


Recommended Improvements