Overview
In this workflow we are checking the performance of the check-in-check-out workflow running in the Fameflower release (
Backend:
- mod-circulation-18.0.9
- mod-circulation-storage-11.0.8
- mod-inventory-storage-19.1.2
- mod-inventory-14.1.3
- okapi-2.38.0
Frontend:
- folio_circulation-2.0.0
- Item Check-in (folio_checkin-2.0.1)
- Item Check-out (folio_checkout-3.0.2)
Environment:
- 55 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 1 db.r5.xlarge AWS RDS instance
- INFO logging level
High Level Summary
Overall check in, check out time in seconds
- Average check in time is 1.51 seconds for a typical use case of 8 users, 1.65 seconds for 20 users
- Average check out time is 1.75 seconds for a typical use case of 8 users, 1.90 seconds for 20 users
- Slow APIs taking more than 100ms to run
- POST checkout-by-barcode
- POST checkin-by-barcode
- Get circulation/loans
- Get inventory/items
- mod-inventory-storage log warnings for missing indexes - 64K lines in 45 minutes run. Logging level could be reduced to WARNING or INFO, but at the cost of having less data to work with should there be a need to troubleshoot. Adding the missing indexes could improve performance while stop logging these warnings
- JVM profiling shows JSON de/serialization operations one of the slowest operations, totaling more CPU time than other calls. Since FOLIO modules retrieve and stores JSON objects, making sure that serializing and deserializing JSON efficient is essential, see Recommended Improvements
Test Runs
Test | Virtual Users | Duration | OKAPI log level | OKAPI Version | Profiled |
1. | 1 | 30 min | INFO | 2.38.0 | No |
2. | 5 | 30 min | INFO | 2.38.0 | No |
3. | 8 | 30 min | INFO | 2.38.0 | No |
4. | 20 | 20 min | INFO | 2.38.0 | No |
5. | 8 | 42 min | INFO | 2.38.0 | Yes |
Results
Response times
Average (seconds) 50th %tile (seconds) 75th %tile (seconds) 95th %tile (seconds) Check-in Check-out Check-in Check-out Check-in Check-out Check-in Check-out 1 user 1.015 1.234 0.96 1.277 1.071 1.409 1.322 1.653 5 users 1.236 1.488 1.156 1.393 1.464 1.869 1.704 2.219 8 users 1.512 1.751 1.403 1.852 1.741 2.031 2.02 2.274 20 users 1.649 1.898 1.535 1.996 1.896 2.211 2.252 2.539 Slow APIs taking more than 100 ms to return
API 1 user (75th %tile) 5 users (75th %tile) 8 users (75th %tile) 20 Users (75th %tile) POST checkout-by-barcode 615 ms 905 ms 906 ms 988 ms POST checkin-by-barcode
548 ms 830 ms 1053 ms 1137 ms Get circulation/loans 283 ms 346 ms 449 ms 479 ms Get inventory/items 217 ms 232 ms 237 ms 281 ms
JVM Profiling
- Overall slow methods (between the modules profiled: okapi, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)
- To see more drill down these methods at: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940345995&to=1589943180494&var-service_tag=okapi&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s
- Only slow Okapi methods:
When drilling down org.folio.okapi.managers.ModuleManager.getEnabledModules, we get the following tree. To see more click here: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/U9JtDPLWz/stacktrace?orgId=1&class=org.folio.okapi.managers.ModuleManager&method=getEnabledModules&from=1589940304610&to=1589943246772
- Slow mod-circulation methods:
- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1590418466176&to=1590420349447&var-service_tag=mod-inventory&var-aggregation=10s
- Areas of code that serializes and deserializes JSON objects
- org.folio.inventory.storage.external.ExternalStorageModuleCollection
- org.forlio.inventory.resources.Items$$Lambdas...apply()
- org.folio.okapi.util.LockedStringMap.lambda$getString()
- org.folio.okapi.util.LockedTypedMap.lambda...handle()
- org.folio.okapi.managers..proxy
- org.folio.cql2pgjson.CQL2PgJSON.loadDbSchema
- org.folio.rest.persist.PostgresClient.deserializeRow
- Logging
- org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled
- org.apache.logging.slf4j.Log4jLogger.info
- org.joda.time.DateTime.parse
- org.folio.circulation.domain.Loan.getReturnDate
- org.folio.circulation.domain.Loan.getDueDate
- Areas of code that serializes and deserializes JSON objects
- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1590418466176&to=1590420349447&var-service_tag=mod-inventory&var-aggregation=10s
Database
Database does not show much CPU usage for 1, 5, 8 and 20 users runs. At maximum only 25% CPU usage for the high case of 20 users.
The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d')) WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'actionType', CQL >>> SQL: actionType == Check-out >>> lower(f_unaccent(patron_action_session.jsonb->>'actionType')) LIKE lower(f_unaccent('Check-out')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-05-20T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T00:00:00.000Z'
CPU Utilization
1 user | 5 users | 8 users | 20 users | |||||
---|---|---|---|---|---|---|---|---|
Average % | Range % | Average % | Range % | Average % | Range % | Average % | Range % | |
Okapi | 1.86 | 0.32-6.85 | 5.95 | 1.16 - 16.55 | 7.92 | 1.41-16.07 | 17.52 | 2.36-29.85 |
mod-inventory | 0.40 | 0.28 - 3.01 | 0.76 | 0.32-8.36 | 0.62 | 0.8-2.17 | 1 | 0.25-3 |
mod-inventory-storage | 0.94 | 0.328 - 4.09 | 2.88 | 0.36-9.11 | 3.33 | 1.84-19.72 | 7.96 | 1.88-14.67 |
Memory
Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.
1 user | 5 users | 8 users | 20 users | |
---|---|---|---|---|
Average | Average | Average | Average | |
Okapi | 50% | 50% | 46% | 46% |
mod-circulation | 70% | 62% | 70% | 71% |
mod-circulation-storage | 30% | 30% | 31% | 31% |
mod-inventory | 38% | 38% | 38% | 38% |
mod-inventory-storage | 41% | 41% | 41% | 41% |
Logging
With INFO log level, In a 45 minutes run, there were more than 66K lines of log in mod-circulation-storage logs, 64K of them were the following warnings. Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d')) WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21'))
Recommended Improvements
- In mod-circulation and okapi consider using a more efficient JSON package or calling use the existing jackson serialization calls in a different way to address the item: JVM profiling shows JSON de/serialization operations one of the slowest operations.
- In mod-circulation consider using a more efficient date-time package instead of joda time because it's one of the slowest operations.
- Consider logging with ERROR level if not fixing the JIRA below to reduce the excess logging by mod-circulation-storage
- Have follow-up stories to study the performance of the four APIs that are still taking over 100ms to return to see where performance could improve.
Appendix
For more raw data of the test runs please see the attached check-in-check-out-FF-UChicago.xlsx.