Overview
In this workflow we are checking the performance of the check-in-check-out workflow running in the Fameflower release. We will test it with 1, 5, 8, and 20 virtual users for 30 minutes. A longevity test will also be executed to see if there were memory issues.
Backend:
- mod-circulation-18.0.9
- mod-circulation-storage-11.0.8
- mod-inventory-storage-19.1.2
- mod-inventory-14.1.3
- okapi-2.38.0
Frontend:
- folio_circulation-2.0.0
- Item Check-in (folio_checkin-2.0.1)
- Item Check-out (folio_checkout-3.0.2)
Environment:
- 55 back-end modules deployed in 110 ECS services
- 3 okapi ECS services
- 8 m5.large EC2 instances
- 1 db.r5.xlarge AWS RDS instance
- INFO logging level
High Level Summary
Overall check in, check out time in seconds
Average (seconds) 50th %tile (seconds) 75th %tile (seconds) 95th %tile (seconds) Check-in Check-out Check-in Check-out Check-in Check-out Check-in Check-out 1 user 1.015 1.234 0.96 1.277 1.071 1.409 1.322 1.653 5 users 1.236 1.488 1.156 1.393 1.464 1.869 1.704 2.219 8 users 1.512 1.751 1.403 1.852 1.741 2.031 2.02 2.274 20 users 1.649 1.898 1.535 1.996 1.896 2.211 2.252 2.539 Slow APIs taking more than 100 ms to return
API 1 user (75th %tile) 5 users (75th %tile) 8 users (75th %tile) 20 Users (75th %tile) POST checkout-by-barcode 615 ms 905 ms 906 ms 988 ms POST checkin-by-barcode
548 ms 830 ms 1053 ms 1137 ms Get circulation/loans 283 ms 346 ms 449 ms 479 ms Get inventory/items 217 ms 232 ms 237 ms 281 ms - Excess logging of missing indexes - 64K lines in 45 minutes run. Logging level could be reduced to WARNING or INFO, but at the cost of having less data to work with should there be a need to troubleshoot.
- JVM profiling shows JSON de/serialization operations one of the slowest operations.
Test Runs
Test | Virtual Users | Duration | OKAPI log level | OKAPI Version | Profiled |
1. | 1 | 30 min | INFO | 2.38.0 | No |
2. | 5 | 30 min | INFO | 2.38.0 | No |
3. | 8 | 30 min | INFO | 2.38.0 | No |
4. | 20 | 20 min | INFO | 2.38.0 | No |
5. | 8 | 42 min | INFO | 2.38.0 | Yes |
Results
JVM Profiling
- Overall slow methods (between the modules profiled: okapi, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)
- To see more drill down these methods at: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940345995&to=1589943180494&var-service_tag=okapi&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s
- Only slow Okapi methods:
When drilling down org.folio.okapi.managers.ModuleManager.getEnabledModules, we get the following tree. To see more click here: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/U9JtDPLWz/stacktrace?orgId=1&class=org.folio.okapi.managers.ModuleManager&method=getEnabledModules&from=1589940304610&to=1589943246772
- Slow mod-circulation methods:
- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940195166&to=1589943457693&var-service_tag=mod-circulation&var-aggregation=10s
- Areas of code that serializes and deserializes JSON objects
- org.folio.inventory.storage.external.ExternalStorageModuleCollection
- org.forlio.inventory.resources.Items$$Lambdas...apply()
- org.folio.okapi.util.LockedStringMap.lambda$getString()
- org.folio.okapi.util.LockedTypedMap.lambda...handle()
- org.folio.okapi.managers..proxy
- org.folio.cql2pgjson.CQL2PgJSON.loadDbSchema
- org.folio.rest.persist.PostgresClient.deserializeRow
- Logging
- org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled
- org.apache.logging.slf4j.Log4jLogger.info
- org.joda.time.DateTime.parse
- org.folio.circulation.domain.Loan.getReturnDate
- org.folio.circulation.domain.Loan.getDueDate
- Areas of code that serializes and deserializes JSON objects
- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940195166&to=1589943457693&var-service_tag=mod-circulation&var-aggregation=10s
Database
Database does not show much CPU usage for 1, 5, 8 and 20 users runs. At maximum only 25% CPU usage for the high case of 20 users.
The following WARNING statements of missing indexes were generated during a test run and logged by mod-circulation-storage:
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d')) WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'actionType', CQL >>> SQL: actionType == Check-out >>> lower(f_unaccent(patron_action_session.jsonb->>'actionType')) LIKE lower(f_unaccent('Check-out')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-05-20T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T00:00:00.000Z'
CPU Utilization
1 user | 5 users | 8 users | 20 users | |||||
---|---|---|---|---|---|---|---|---|
Average % | Range % | Average % | Range % | Average % | Range % | Average % | Range % | |
Okapi | 1.86 | 0.32-6.85 | 5.95 | 1.16 - 16.55 | 7.92 | 1.41-16.07 | 17.52 | 2.36-29.85 |
mod-circulation | 0.87 | 0.24- 6.17 | 2.40 | 0.36 - 9.73 | 2.56 | 0.26-7.58 | 5.57 | 1.51-11.96 |
mod-circulation-storage | 0.71 | 0.3 - 2.70 | 1.52 | 0.24 - 4.16 | 1.75 | 0.5-3.96 | 3.91 | 0.48-7.22 |
mod-inventory | 0.40 | 0.28 - 3.01 | 0.76 | 0.32-8.36 | 0.62 | 0.8-2.17 | 1 | 0.25-3 |
mod-inventory-storage | 0.94 | 0.328 - 4.09 | 2.88 | 0.36-9.11 | 3.33 | 1.84-19.72 | 7.96 | 1.88-14.67 |
Memory
Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.
1 user | 5 users | 8 users | 20 users | |
---|---|---|---|---|
Average | Average | Average | Average | |
Okapi | 50% | 50% | 46% | 46% |
mod-circulation | 70% | 62% | 70% | 71% |
mod-circulation-storage | 30% | 30% | 31% | 31% |
mod-inventory | 38% | 38% | 38% | 38% |
mod-inventory-storage | 41% | 41% | 41% | 41% |
Logging
With INFO log level, In a 45 minutes run, there were more than 66K lines of log in mod-circulation-storage logs, 64K of them were the following warnings. Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d')) WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21'))
Appendix
For more raw data of the test runs please see the attached check-in-check-out-FF-UChicago.xlsx.