- PERF-27Getting issue details... STATUS
Overview
In this api, we are checking the performance of placing a holding on a discovery service interface (EDS) running in the Fameflower release -
We tested it with 1, 5, 8, and 20 virtual users for 30 minutes.
Backend:
- mod-patron-4.1.0
- mod-circulation-18.0.9
- mod-circulation-storage-11.0.0
- mod-inventory-storage-19.1.2
- mod-inventory-14.1.3
- mod-feesfines-15.7.2
- okapi-2.38.0
- mod-authtoken-2.4.0
- mod-permissions-5.9.0
Environment:
- 55 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 1 db.r5.xlarge AWS RDS instance
- INFO logging level
High Level Summary
- The placing a hold request takes about 1 seconds: POST /patron/account/{id}/instance/{instanceId}/hold.
- POST /patron/account/{id}/instance/{instanceId}/hold and POST /circulation/requests/instances show almost the same response time. In addition, mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code. Seems like the main issue is the mod-circulation instance-level request API, POST /circulation/requests/instances which is called by mod-patron subsequently.
- JVM profiling shows the slowest operations, totalling more CPU time than other calls:
- logging with INFO level - Logging level could be reduced to WARNING or ERROR, but at the cost of having less data to work with should there be a need to troubleshoot. The most CPU consuming methods are org.folio.circulation.support.Result.next and org.folio.circulation.support.Result.after from org.folio.circulation.support.SingleRecordFetcher, and org.folio.circulation.support.SingleRecordFetcher.fetch method
- JSON de/serialization - fasterxml.jackson.databind.ObjectMapper.readValue method of mod-circulation-storage and mod-inventory-storage consumed high CPU resources as there were a lot of JSON decoding, this implementation could be reviewed for optimisation possibilities. The most CPU consuming methods are org.folio.rest.persist.PgUtil.streamGet from org.folio.rest.impl.InstanceStorageAPI.getInstanceStorageInstancesByInstanceId (mod-inventory-storage), org.folio.rest.impl.HoldingsStorageAPI.getHoldingsStorageHoldings(mod-inventory-storage) and org.folio.rest.persist.PostgresClient.doGet from org.folio.rest.impl.LoansAPI.getLoanStorageLoans and org.folio.rest.impl.RequestsAPI.getRequestStorageRequests(mod-circulation-storage)
- request's validation - need to consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations. The slowest method is org.folio.rest.RestVerticle.route and org.folio.rest.RestVerticle.parseParams
- mod-circulation-storage and mod-feesfines generated missing indexes warnings, see Recommended Improvements for the JIRAs that were created/covered by this testing effort.
Test Runs
Test | Virtual Users | Duration | OKAPI log level | Profiled | Ramp up (total time in seconds) | API call |
1. | 1 | 30 min | WARNING | No | 1 | POST /patron/account/{id}/instance/{instanceId}/hold |
2. | 5 | 30 min | WARNING | No | 50 | POST /patron/account/{id}/instance/{instanceId}/hold |
3. | 8 | 30 min | WARNING | No | 80 | POST /patron/account/{id}/instance/{instanceId}/hold |
4. | 20 | 30 min | WARNING | No | 200 | POST /patron/account/{id}/instance/{instanceId}/hold |
5. | 20 | 30 min | WARNING | Yes | 200 | POST /patron/account/{id}/instance/{instanceId}/hold |
6. | 1 | 30 min | WARNING | No | 1 | POST /circulation/requests/instances |
7. | 5 | 30 min | WARNING | No | 50 | POST /circulation/requests/instances |
8. | 8 | 30 min | WARNING | No | 80 | POST /circulation/requests/instances |
9. | 20 | 30 min | WARNING | No | 200 | POST /circulation/requests/instances |
10. | 20 | 30 min | WARNING | Yes | 200 | POST /circulation/requests/instances |
11. | 1 | 30 min | WARNING | Yes | 1 | POST /circulation/requests |
12. | 8 | 30 min | WARNING | Yes | 200 | POST /circulation/requests |
Results
- An instance-level request to mod-patron
POST /patron/account/{id}/instance/{instanceId}/hold
{ "item": { "instanceId": "c2063f7c-245a-4c24-b643-67fea831ebcc" }, "requestDate": "2020-06-01T18:51:24.000+0000", "pickupLocationId": "7068e104-aa14-4f30-a8bf-71f71cc15e07" }
Overall time for the placing holds api in seconds
Average 75th %tile 1 user 933.01 ms 1.10 s 5 users 1.01 s 1.12 s 8 users 1.03 s 1.18 s 20 users 1.33 s 1.47 s mod-patron subsequently calls mod-circulation to place instance-level requests
POST /circulation/requests/instances
{ "requestDate": "2017-07-29T22:25:37Z", "requesterId": "21932a85-bd00-446b-9565-46e0c1a5490b", "instanceId": "195efae1-588f-47bd-a181-13a2eb437701", "requestExpirationDate": "2017-07-25T22:25:37Z", "pickupServicePointId": "8359f2bc-b83e-48e1-8a5b-ca1a74e840de" }
Overall time in seconds
Average 75th %tile 1 user 864.35 ms 1.01 s 5 users 1.09 s 1.14 s 8 users 1.15 s 1.18 s 20 users 1.58 s 1.53 s An Item-level request
POST /circulation/requests
{ "requestDate": "2020-06-18T09:23:11.957Z", "requesterId": "e96618a9-04ee-4fea-aa60-306a8f4dd89b", "requestType": "Hold", "fulfilmentPreference": "Hold Shelf", "itemId": "0000023d-a95f-4396-8d37-42dda48cdf3d", "requestExpirationDate": "2020-06-18T09:23:11.957Z", "pickupServicePointId": "d80a641e-50b9-4bd2-99a2-a4a12c8515cb" }
Overall time in seconds
Average 75th %tile 1 user 807.90 ms 819.00 ms 8 users 855.92 ms 855.00 ms
The comparison table between an instance-level request and an Item-level request to mod-circulation (done 23.07.2020). The item-level request is 24.5% faster than an instance-level request
An instance-level request to mod-circulation POST /circulation/requests/instances | An Item-level request to mod-circulation POST /circulation/requests | |||
Average | 75th %tile | Average | 75th %tile | |
1 user | 1.07 s | 1.09 s | 807.90 ms | 819.00 ms |
8 users | 1.09 s | 1.09 s | 855.92 ms | 855.00 ms |
JVM Profiling
- Overall slow methods (between the modules profiled: mod-patron, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)
To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591958944552&to=1591961253729&var-service_tag=mod-patron&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s
- Only slow mod-circulation methods:
When drilling down org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled, we get the following tree. To see more click here: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.apache.logging.log4j.spi.AbstractLogger&method=logIfEnabled&from=1591958944552&to=1591961253729
When drilling down com.fasterxml.jackson.databind.ObjectMapper.readValue, we get the following tree. http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=com.fasterxml.jackson.databind.ObjectMapper&method=readValue&from=1591958944552&to=1591961253729
When drilling down org.hibernate.validator.internal.engine.ValidatorImpl.validate, we get the following tree. To see more drill down these methods at: http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/U8JtDPLWy/stacktrace-new?orgId=1&class=org.hibernate.validator.internal.engine.ValidatorImpl&method=validate&from=1591958944552&to=1591961253729
Slow mod-circulation-storage methods:
Can drill down these methods for more information at : http://ec2-54-165-58-65.compute-1.amazonaws.com/grafana/d/b3iP-8hiz/java-profiling-folio-new?orgId=1&from=1591959015664&to=1591960971673&var-service_tag=mod-circulation-storage&var-aggregation=10s
Database
The database does not show much CPU usage for 1, 5, 8, 20 users run. At maximum 5%-6% CPU usage for the high case of 20 users.
There aren't any WARNING statements of missing indexes in mod-inventory-storage, mod-patron modules.
The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z' WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))
The following WARNING statements of missing indexes were generated during a test run and logged by mod-feesfines:
WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 6977632d-9ad5-4c37-b067-1e92343c18b7 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('6977632d-9ad5-4c37-b067-1e92343c18b7'))
Database Slow queries
The slowest query which took the most of execution time was initiated by the mod_circulation_storage service presented in the following table:
Percent of total time | Average Time,ms | Calls | Module | Query |
7% | 25 ms | 426 | mod_circulation_storage | SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan WHERE (lower(f_unaccent(loan.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Open'))) AND (loan.jsonb->>'userId' ~ '') LIMIT 5000 OFFSET 0 |
CPU Utilization
1 user | 5 users | 8 users | 20 users | |||||
---|---|---|---|---|---|---|---|---|
Average % | Range % | Average % | Range % | Average % | Range % | Average % | Range % | |
mod-circulation | 1.23 | 0-31.98 | 3 | 2-20 | 3 | 0-14 | 6 | 0-88 |
mod-circulation-storage | 0.61 | 0.25-6.60 | 1.73 | 0.23-26.77 | 2 | 0-6 | 3 | 0-9 |
mod-patron | 1 | 0-24 | 2 | 2-66 | 1 | 0-12 | 0 | 0-11 |
mod-feesfines | 100 | 101-102 | 100 | 101-103 | 100 | 101-103 | 100 | 101-103 |
Memory
Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.
1 user | 5 users | 8 users | 20 users | |
---|---|---|---|---|
Average | Average | Average | Average | |
mod-circulation | 68% | 69% | 70% | 76% |
mod-circulation-storage | 31% | 32% | 32% | 32% |
mod-patron | 49% | 52% | 52% | 52% |
mod-inventory | 33% | 33% | 34% | 34% |
mod-inventory-storage | 38% | 38% | 39% | 39% |
mod-feesfines | 97% | 97% | 97% | 97% |
Logging
With INFO log level, In a 30 minutes run, there were more than 50K lines of log in mod-feesfines logs, 50K of them were the following warnings. Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.
WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 01e1bcb8-56b6-466b-a677-bd8c3189b036 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('01e1bcb8-56b6-466b-a677-bd8c3189b036')) WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == a22b1b9e-d807-43b4-8d6c-455be0e66f8a >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('a22b1b9e-d807-43b4-8d6c-455be0e66f8a')) WARNING: Doing LIKE search without index for accounts.jsonb->>'loanId', CQL >>> SQL: loanId == 88b1e963-f49a-4b6b-8641-a66ac7375891 >>> lower(f_unaccent(accounts.jsonb->>'loanId')) LIKE lower(f_unaccent('88b1e963-f49a-4b6b-8641-a66ac7375891'))
There were more than 3K lines of log in mod-circulation-storage logs, 3K of them were the following warnings.
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-06-15T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-06-15T00:00:00.000Z' WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Request expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Request expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Hold expiration" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Hold expiration')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'patronId', CQL >>> SQL: patronId == a4dfdbd1-a8f6-4a20-9d44-f74a556eb850 >>> lower(f_unaccent(patron_action_session.jsonb->>'patronId')) LIKE lower(f_unaccent('a4dfdbd1-a8f6-4a20-9d44-f74a556eb850'))
Recommended Improvements
- In mod-circulation, mod-circulation-storage, mod-inventory-storage consider using a more efficient JSON package or calling use the existing jackson serialization calls in a different way to address the item: JVM profiling shows JSON de/serialization operations one of the slowest operations.
- In mod-circulation consider using a more efficient request's validation instead of org.hibernate.validator.internal.engine.ValidatorImpl.validate method because it's one of the slowest operations.
- The following JIRAs have been created for mod-circulation-storage and mod-fees-fines:
- - CIRCSTORE-216Getting issue details... STATUS
- - MODFEE-67Getting issue details... STATUS
- Some missing indexes warnings are covered by the story - CIRCSTORE-215Getting issue details... STATUS
- Consider logging with ERROR level if not fixing the JIRA below to reduce the excess logging by mod-circulation-storage and mod-feesfines
- It will great for the modules to have custom metrics, such as rates and response times of out-going API and database calls so that we can get more insight on how an API behaves.
Discussions
We could not get to the bottom of instance-level requests slowness. That why we needed to look into item-level request. mod-circulation's native item-level request code is 24.5% faster than mod-circulation's instance-level request code.
Appendix
See Attached place-holds-FF.xlsx for details