Overview
In this workflow we are checking the performance of the check-in-check-out workflow running in the Fameflower release. We will test it with 1, 5, 8, and 20 virtual users for 30 minutes. A longevity test will also be executed to see if there were memory issues.
Backend:
- mod-circulation-18.0.9
- mod-circulation-storage-11.0.8
- mod-inventory-storage-19.1.2
- mod-inventory-14.1.3
- okapi-2.38.0
Frontend:
- folio_circulation-2.0.0
- Item Check-in (folio_checkin-2.0.1)
- Item Check-out (folio_checkout-3.0.2)
Folio build was deployed with 50+ ECS services distributed randomly across four m5.large EC2 instances and the database was created on the db.r5.xlarge AWS RDS instance. Logging level was set to default INFO.
High Level Summary
Overall check in, check out time in seconds
Average (seconds) 50th %tile (seconds) 75th %tile (seconds) 95th %tile (seconds) Check-in Check-out Check-in Check-out Check-in Check-out Check-in Check-out 1 user 1.015 1.234 0.96 1.277 1.071 1.409 1.322 1.653 5 users 1.236 1.488 1.156 1.393 1.464 1.869 1.704 2.219 8 users 1.512 1.751 1.403 1.852 1.741 2.031 2.02 2.274 20 users 1.649 1.898 1.535 1.996 1.896 2.211 2.252 2.539 Slow APIs taking more than 100 ms to return
API 1 user (75th %tile) 5 users (75th %tile) 8 users (75th %tile) 20 Users (75th %tile) POST checkout-by-barcode 615 ms 905 ms 906 ms 988 ms POST checkin-by-barcode
548 ms 830 ms 1053 ms 1137 ms Get circulation/loans 283 ms 346 ms 449 ms 479 ms Get inventory/items 217 ms 232 ms 237 ms 281 ms - Excess logging of missing indexes - 64K lines in 45 minutes run. Logging level could be reduced to WARNING or INFO, but at the cost of having less data to work with should there be a need to troubleshoot.
- JVM profiling shows JSON de/serialization operations one of the slowest operations.
Test Runs
Test | Virtual Users | Duration | OKAPI log level | OKAPI Version | Profiled |
1. | 1 | 30 min | INFO | 2.38.0 | No |
2. | 5 | 30 min | INFO | 2.38.0 | No |
3. | 8 | 30 min | INFO | 2.38.0 | No |
4. | 20 | 20 min | INFO | 2.38.0 | No |
5. | 8 | 42 min | INFO | 2.38.0 | Yes |
Results
JVM Profiling
- Overall slow methods (between the modules profiled: okapi, mod-circulation, mod-circulation-storage, mod-inventory, mod-inventory-storage)
- To see more drill down these methods at: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940345995&to=1589943180494&var-service_tag=okapi&var-service_tag=mod-inventory-storage&var-service_tag=mod-inventory&var-service_tag=mod-circulation-storage&var-service_tag=mod-circulation&var-aggregation=10s
- Only slow Okapi methods:
When drilling down org.folio.okapi.managers.ModuleManager.getEnabledModules, we get the following tree. To see more click here: http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/U9JtDPLWz/stacktrace?orgId=1&class=org.folio.okapi.managers.ModuleManager&method=getEnabledModules&from=1589940304610&to=1589943246772
- Slow mod-circulation methods:
- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940195166&to=1589943457693&var-service_tag=mod-circulation&var-aggregation=10s
- Areas of code that serializes and deserializes JSON objects
- org.folio.inventory.storage.external.ExternalStorageModuleCollection
- org.forlio.inventory.resources.Items$$Lambdas...apply()
- org.folio.okapi.util.LockedStringMap.lambda$getString()
- org.folio.okapi.util.LockedTypedMap.lambda...handle()
- org.folio.okapi.managers..proxy
- org.folio.cql2pgjson.CQL2PgJSON.loadDbSchema
- org.folio.rest.persist.PostgresClient.deserializeRow
- Logging
- org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled
- org.apache.logging.slf4j.Log4jLogger.info
- org.joda.time.DateTime.parse
- org.folio.circulation.domain.Loan.getReturnDate
- org.folio.circulation.domain.Loan.getDueDate
- Areas of code that serializes and deserializes JSON objects
- Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940195166&to=1589943457693&var-service_tag=mod-circulation&var-aggregation=10s
Database
Database does not show much CPU usage for 1, 5, 8 and 20 users runs. At maximum only 25% CPU usage for the high case of 20 users.
The following WARNING statements of missing indexes were generated during a test run:
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d')) WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21')) WARNING: Doing LIKE search without index for patron_action_session.jsonb->>'actionType', CQL >>> SQL: actionType == Check-out >>> lower(f_unaccent(patron_action_session.jsonb->>'actionType')) LIKE lower(f_unaccent('Check-out')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->>'triggeringEvent', CQL >>> SQL: triggeringEvent == "Due date" >>> lower(f_unaccent(scheduled_notice.jsonb->>'triggeringEvent')) LIKE lower(f_unaccent('Due date')) WARNING: Doing LIKE search without index for scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime', CQL >>> SQL: noticeConfig.sendInRealTime == false >>> lower(f_unaccent(scheduled_notice.jsonb->'noticeConfig'->>'sendInRealTime')) LIKE lower(f_unaccent('false')) WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < 2020-05-20T00:00:00.000Z >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T00:00:00.000Z'
CPU Utilization
1 user | 5 users | 8 users | 20 users | |||||
---|---|---|---|---|---|---|---|---|
Average % | Range % | Average % | Range % | Average % | Range % | Average % | Range % | |
Okapi | 1.86 | 0.32-6.85 | 5.95 | 1.16 - 16.55 | 7.92 | 1.41-16.07 | 17.52 | 2.36-29.85 |
mod-circulation | 0.87 | 0.24- 6.17 | 2.40 | 0.36 - 9.73 | 2.56 | 0.26-7.58 | 5.57 | 1.51-11.96 |
mod-circulation-storage | 0.71 | 0.3 - 2.70 | 1.52 | 0.24 - 4.16 | 1.75 | 0.5-3.96 | 3.91 | 0.48-7.22 |
mod-inventory | 0.40 | 0.28 - 3.01 | 0.76 | 0.32-8.36 | 0.62 | 0.8-2.17 | 1 | 0.25-3 |
mod-inventory-storage | 0.94 | 0.328 - 4.09 | 2.88 | 0.36-9.11 | 3.33 | 1.84-19.72 | 7.96 | 1.88-14.67 |
Memory
Memory was stable throughout the runs, only a spike here or there, but in a 30 minutes run they were consistent.
1 user | 5 users | 8 users | 20 users | |
---|---|---|---|---|
Average | Average | Average | Average | |
Okapi | 50% | 50% | 46% | 46% |
mod-circulation | 70% | 62% | 70% | 71% |
mod-circulation-storage | 30% | 30% | 31% | 31% |
mod-inventory | 38% | 38% | 38% | 38% |
mod-inventory-storage | 41% | 41% | 41% | 41% |
Logging
With INFO log level, In a 45 minutes run, there were more than 66K lines of log, 64K of them were the following warnings. Indeed this corroborates with the JVM profiling showing one of the top 3 slowest methods is for logging.
WARNING: Doing SQL query without index for scheduled_notice.jsonb->>'nextRunTime', CQL >>> SQL: nextRunTime < >>> scheduled_notice.jsonb->>'nextRunTime' <'2020-05-20T03:07:08.090Z' WARNING: Doing FT search without index for request.jsonb->>'pickupServicePointId', CQL >>> SQL: pickupServicePointId = 130d8bff-bdbd-4dc5-a4ac-6d970f4918ff >>> to_tsvector('simple', f_unaccent(request.jsonb->>'pickupServicePointId')) @@ replace((to_tsquery('simple', f_unaccent('''130d8bff-bdbd-4dc5-a4ac-6d970f4918ff''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId = 001164c0-5466-4822-86f2-dcd2393a7ef7 >>> to_tsvector('simple', f_unaccent(request.jsonb->>'requesterId')) @@ replace((to_tsquery('simple', f_unaccent('''001164c0-5466-4822-86f2-dcd2393a7ef7''')))::text, '&', '<->')::tsquery WARNING: Doing FT search without index for request.jsonb->>'status', CQL >>> SQL: status = Open >>> to_tsvector('simple', f_unaccent(request.jsonb->>'status')) @@ replace((to_tsquery('simple', f_unaccent('''Open''')))::text, '&', '<->')::tsquery WARNING: Doing LIKE search without index for jsonb->>'requestId', CQL >>> SQL: requestId == 01819cdb-de38-4562-9835-14071dcaf53d >>> lower(f_unaccent(jsonb->>'requestId')) LIKE lower(f_unaccent('01819cdb-de38-4562-9835-14071dcaf53d')) WARNING: Doing LIKE search without index for request.jsonb->>'requesterId', CQL >>> SQL: requesterId == 005de89f-bfe1-4bf7-a1e3-f34a707ace21 >>> lower(f_unaccent(request.jsonb->>'requesterId')) LIKE lower(f_unaccent('005de89f-bfe1-4bf7-a1e3-f34a707ace21'))
Appendix
For more raw data of the test runs please see the attached check-in-check-out-FF-UChicago.xlsx.