Check-in-check-out Test Report (Ramsons) [ECS]
Test status: PASSED
Overview In progress
- Regression testing of Check-In/Check-Out (CI/CO) fixed load tests on okapi based environment in Ramsons ECS release.
- The purposes of CI/CO testing:
- To define response times of transaction controllers for Check-In and Check-Out
- To find any trends for resource utilization and recommend improvements
- To check how system behaves over extended period during longevity test
- Compare results (current and previous)
Summary
- Common results:
- Check-in/Check-out flows degraded 25% in average for all tests except longevity test.
- Tests #1, #2, #3, #4
- Check-in - minimum 556 ms with 30 vUsers and maximum 659 ms with 75 vUsers.
- Check-out - minimum 1120 ms with 30 vUsers and maximum 1483 ms with 75 vUsers.
- Test #5
- Check-in - 493 ms with 30 vUsers
- Check-out - 1095 ms with 30 vUsers.
- Comparison with Quesnelia results:
- CI/CO response times degradation (Tests #1, #2, #3, #4):
Test,# vUsers Check-Out Controller (CO) Check-In Controller (CI) 1 8 27% 24% 2 20 26% 27% 3 30 25% 28% 4 75 26% 20%
- CI/CO response times (test #5 - longevity test):
- 30 vUsers - 13% improved in CO and 8% degraded in CI flow.
- CI/CO response times degradation (Tests #1, #2, #3, #4):
Resources
- CPU utilization
- CPU utilization depends on number of concurrent virtual users and increasing accordingly. Top used modules: okapi, mod-pubsub, mod-inventory, mod-inventory-storage, mod-circulation, mod-users, mod-authtoken, nginx-okapi.
- During 75 vUsers test nginx-okapi module spiked from 1% to 12% during ramp-up period after 40 vUsers point.
- Memory consumption
- Memory consumption of mod-pubsub module grow during test #4 75 vUsers test. It can be connected to saturation point after increased load. The highest usage was at 65%.
- Mod-pubsub grew during longevity test. The highest usage was at 81%.
Top 5 modules, consumption at % in 75 vUsers test:
mod-oa-b 75.86 mod-pubsub-b 69.12 mod-dcb-b 62.23 mod-inventory-b 61.29 mod-data-import-b 54.28
- RDS CPU utilization average
- 8 vUsers - 15%, 20 vUsers - 26%, 30 vUsers - 32%, 75 vUsers - 54% During longevity test CPU was 30% in average.
- If to compare with results in Quesnelia - database utilized 20% more resources during tests #1, #2, #3, #4 and the same consumption during #5 (longevity test).
- CPU (User) usage by broker
- 8 vUsers - 12%, 20 vUsers - 13%, 30 vUsers - 14%, 75 vUsers - 17%. During Longevity test - 2 brokers - 17%, 2 other brokers - 28%.
Recommendations & Jiras
- The previous results report:
- The current ticket: PERF-983 - [Ramsons] [ECS] CI/CO
- mod-serials-management-b affect DB connection growth 200 connection in average. Disabling this module do not affect response times or error rate but significantly decrease DB connection number.
- Investigate mod-pubsub module memory growing trend. Check errors during successful 30 vUsers test in Error block.
Test Runs
The following table contains tests configuration information
Test # | vUsers | Ramp-up, sec | Duration, sec |
1 | 8 | 80 | 2700 |
2 | 20 | 200 | 2700 |
3 | 30 | 300 | 2700 |
4 | 75 | 750 | 2700 |
5 | 30 | 300 | 86400 |
Results
Errors:
Error messages:
Error messagesAll error messages during successful CICO test with 30 vUsers test /mod-inn-reach/ org.springframework.dao.InvalidDataAccessResourceUsageException: JDBC exception executing SQL [select ti1_0.id,ti1_0.created_by_userid,ti1_0.created_by_username,ti1_0.created_date,ti1_0.tenant_id,ti1_0.updated_by_userid,ti1_0.updated_by_username,ti1_0.updated_date from tenant_info ti1_0] [ERROR: relation "tenant_info" does not exist 11:39:54 [] [] [] [] ERROR $LoggingErrorHandler Unexpected error occurred in scheduled task /mod-pubsub 11:38:54 [] [] [] [] INFO RestUtil Response received with statusCode 40311:38:54 [] [] [] [] INFO aConsumerServiceImpl Delivering for event with ID 1622f66a-16eb-4720-801e-b12f5c83f812 was complete. Checking for response...11:38:54 [] [] [] [] ERROR aConsumerServiceImpl Error delivering ITEM_CHECKED_OUT event with id '1622f66a-16eb-4720-801e-b12f5c83f812' to /automated-patron-blocks/handlers/item-checked-out, response status code is 403, Forbidden 11:38:54 [] [] [] [] INFO Cache Cache:: Access token removed for tenant [tenant] mod-authtoken/ 11:38:54 [198107/automated-patron-blocks] [[tenant]] [] [mod-authtoken] ERROR FilterApi Permission missing in ["audit.pub-sub-handlers.log-record-event.post","remote-storage.pub-sub-handlers.log-record-event.post","pubsub.events.post"] (user permissions) and [] (module permissions). Access for user 'pub-sub' (94e1e5ca-64d1-4cfb-8a3b-6d5269500f48) requires permission: patron-blocks.handlers.item-checked-out.post 12:35:46 [414235/metadata-provider] [[tenant]] [] [mod-authtoken] ERROR Api Token missing, access requires permission: metadata-provider.jobExecutions.collection.get
Response time
The table contains results of Check-in, Check-out tests in Ramsons release.
Test #1, #2, #3, #4
Errors here connected to a short time connectivity problem during 30 seconds. Additional run for 30 vUsers didn't reveal this connectivity problem.
8 vUsers (test #1) | 20 vUsers (test #2) | 30 vUsers (test #3) | 75 vUsers (test #4) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Requests | Response Times (ms) | Response Times (ms) | Response Times (ms) | Response Times (ms) | ||||||||
Label | #Samples | 95th pct | Average | #Samples | 95th pct | Average | #Samples | 95th pct | Average | #Samples | 95th pct | Average |
Check-Out Controller | 1713 | 1450 | 1153 | 2605 | 1327 | 1123 | 2380 | 1278 | 1120 | 14058 | 1823 | 1483 |
Check-In Controller | 1295 | 838 | 590 | 1943 | 706 | 559 | 1804 | 737 | 556 | 10538 | 1044 | 659 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 1712 | 503 | 340 | 2604 | 450 | 324 | 2381 | 411 | 320 | 14060 | 643 | 393 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 1297 | 452 | 298 | 1939 | 386 | 274 | 1796 | 383 | 269 | 10563 | 588 | 304 |
GET_circulation/loans (Submit_patron_barcode) | 1715 | 256 | 192 | 2603 | 238 | 188 | 2380 | 232 | 188 | 14069 | 412 | 237 |
GET_circulation/loans (Submit_barcode_checkout) | 1712 | 263 | 191 | 2604 | 235 | 189 | 2386 | 247 | 190 | 14060 | 369 | 233 |
GET_inventory/items (Submit_barcode_checkout) | 14060 | 136 | 175 |
Test #5
30 vUsers Longevity test | |||
---|---|---|---|
Requests | Samples, Response Times (ms) | ||
Label | #Samples | 95th pct | Average |
Check-Out Controller | 55329 | 1311 | 1095 |
Check-In Controller | 41677 | 621 | 493 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 55327 | 394 | 312 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 41678 | 310 | 216 |
GET_circulation/loans (Submit_patron_barcode) | 55329 | 226 | 184 |
GET_circulation/loans (Submit_barcode_checkout) | 55327 | 227 | 185 |
GET_inventory/items (Submit_barcode_checkout) | 55329 | 74 | 54 |
Comparisons
This table has comparison between average values of response times of Ramsons and Quesnelia releases
8 vUsers (test #1) | 20 vUsers (test #2) | 30 vUsers (test #3) | 75 vUsers (test #4) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Requests | Response Times, milliseconds | |||||||||||||||
Quesnelia | Ramsons | Quesnelia | Ramsons | Quesnelia | Ramsons | Quesnelia | Ramsons | |||||||||
Label | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | ||||
Check-Out Controller | 910 | 1153 | 243 | 26.70% | 893 | 1123 | 230 | 25.76% | 899 | 1120 | 221 | 24.58% | 1176 | 1483 | 307 | 26.11% |
Check-In Controller | 476 | 590 | 114 | 23.95% | 441 | 559 | 118 | 26.76% | 435 | 556 | 121 | 27.82% | 546 | 659 | 113 | 20.70% |
Comparison of longevity test
30 vUsers Longevity (test #5) | |||
---|---|---|---|
Response Times, milliseconds | |||
Quesnelia | Ramsons | ||
Average | Average | Delta,ms | Difference,% |
1258 | 1095 | -163 | -12.96% |
458 | 493 | 35 | 7.64% |
API requests where response times >= 100 milliseconds
API | 75 vUsers Ramsons Average, ms |
---|---|
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 400 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 285 |
GET_circulation/loans (Submit_patron_barcode) | 243 |
GET_circulation/loans (Submit_barcode_checkout) | 242 |
GET_inventory/items (Submit_barcode_checkout) | 136 |
Resources Utilization
CPU Utilization
CPU utilization depends on number of concurrent virtual users and increasing accordingly. Top used modules: okapi, mod-pubsub, mod-inventory, mod-inventory-storage, mod-circulation, mod-users, mod-authtoken, nginx-okapi.
For exact numbers use the chart below.
Tests #1, #2, #3, #4
After applying CPU parameter = 0 in container revisions we can see a relative resource utilization by modules only
Test #5
Memory Consumption
Tests #1, #2, #3, #4
Memory consumption of mod-pubsub module grow during test #4 75 vUsers test. It can be connected to saturation point after increased load. The highest usage was at 65%.
Test #5
Mod-pubsub grew during longevity test. The highest usage was at 81%.
RDS CPU Utilization
RDS CPU utilized:
8 vUsers - 15%, 20 vUsers - 26%, 30 vUsers - 32%, 75 vUsers - 54% During longevity test CPU was 30% in average. If to compare with results in Quesnelia - database utilized 20% more resources during tests #1, #2, #3, #4 and the same consumption during #5 (longevity test).
Tests #1, #2, #3, #4
Test #5
RDS Database Connections
For 45 minute and longevity tests RDS used max 1250 connections.
Tests #1, #2, #3, #4
Test #5
CPU (User) usage by broker
8 vUsers - 12%, 20 vUsers - 13%, 30 vUsers - 14%, 75 vUsers - 17%. During Longevity test - 2 brokers - 17%, 2 other brokers - 28%.
Tests #1, #2, #3, #4
Test #5
Database load
Tests #1, #2, #3, #4 INSERT INTO [tenant]_mod_pubsub.audit_message (id, event_id, event_type, tenant_id, audit_date, state, published_by, correlation_id, created_by, error_message) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10); UPDATE [tenant]_mod_login.auth_attempts SET jsonb = $1::jsonb WHERE id='d2c9e2ce-3bd2-4cc8-a7f4-e0015de548d0' SELECT [tenant]_mod_inventory_storage.count_estimate('SELECT jsonb,id FROM [tenant]_mod_inventory_storage.service_point WHERE ((true) AND ( (get_tsvector(f_unaccent(service_point.jsonb->>''ecsRequestRouting'')) @@ tsquery_phrase(f_unaccent(''true''))) IS NOT TRUE)) AND (id=''650b2d28-dc05-4177-9dc3-c45316fe5aa6'')') INSERT INTO [tenant]_mod_authtoken.refresh_tokens (id, user_id, is_revoked, expires_at) VALUES ($1, $2, $3, $4) UPDATE [tenant]_mod_inventory_storage.item SET jsonb=$1 WHERE id=$2 RETURNING jsonb::text Test #5 INSERT INTO [tenant]_mod_pubsub.audit_message (id, event_id, event_type, tenant_id, audit_date, state, published_by, correlation_id, created_by, error_message) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10); SELECT [tenant]_mod_users.count_estimate('SELECT jsonb,id FROM [tenant]_mod_users.users WHERE lower(f_unaccent(users.jsonb->>''username'')) LIKE lower(f_unaccent(''pub-sub''))')
During 45 minute tests (#1, #2, #3, #4) we see that the top sql -INSERT INTO [tenant]_mod_pubsub.audit_message
During longevity test (#5) INSERT INTO [tenant]_mod_pubsub.audit_message - 41 ms and SELECT [tenant]_mod_inventory_storage.count_estimate - 107 ms.
Tests #1, #2, #3, #4
Test #5
Appendix
Infrastructure
PTF -environment rcon |
---|
|
DB table records size:
|
---|
Modules
Methodology/Approach
Description
Testing includes data preparation step and testing itself
- Data preparation for each test takes up to 20 minutes and consists of truncating involved in testing tables, populating data and updating statuses of items.
- Test itself depends on duration and virtual users number creating necessary load.
In Ramsons token expiration set to 10 minutes by default so to run any tests use new login implementation from the script. Pay attention to Backend Listener. Replace value of application parameter to make the results visible in Grafana dashboard.
Module configuration recommended setup
Update revision in source-record-storage module to exclude every 30 minutes SQL statements - delete rows in marc_indexers
(mi
) WITH deleted_rows
{ "name": "srs.marcIndexers.delete.interval.seconds", "value": "86400" },
Update mod-serials module. Set number of task with 0 to exclude significant database connection growth.
DB trigger setup in Ramsons
Usual PTF CI/CO data preparation script won’t work in Ramsons. To solve that disable trigger updatecompleteupdateddate_item_insert_update before data preparation for the tenant and enable it before test start.
The sql file was updated to do that step from the script.
Data preparation
- To prepare data establish connection by AWS key
- Run CICO_db_preparation.sh script located in /scripts folder. Before use the file tenats.csv to edit the list of tenants to restore the database.
- Files location: Buckets/fse-ptf/Scripts/CICO/Ramsons/
To start test from AWS instance (load generator) use template for the command. Test locally before start.
8 vUsers - nohup jmeter -n -t /home/ptf/testdata/RCON/PERF-983_CICO/circulation_checkInCheckOut_rcon.jmx -l rcon_8vusers.jtl -e -o /home/ptf/testdata/RCON/PERF-983_CICO/results/8vusers -JGlobal_duration=2700 -JVUSERS=8 -JRAMP_UP=80 20vUsers - nohup jmeter -n -t /home/ptf/testdata/RCON/PERF-983_CICO/circulation_checkInCheckOut_rcon.jmx -l rcon_20vusers.jtl -e -o /home/ptf/testdata/RCON/PERF-983_CICO/results/20vusers -JGlobal_duration=2700 -JVUSERS=20 -JRAMP_UP=200 30vUsers - nohup jmeter -n -t /home/ptf/testdata/RCON/PERF-983_CICO/circulation_checkInCheckOut_rcon.jmx -l rcon_30vusers.jtl -e -o /home/ptf/testdata/RCON/PERF-983_CICO/results/30vusers -JGlobal_duration=2700 -JVUSERS=30 -JRAMP_UP=300 75vUsers - nohup jmeter -n -t /home/ptf/testdata/RCON/PERF-983_CICO/circulation_checkInCheckOut_rcon.jmx -l rcon_75vusers.jtl -e -o /home/ptf/testdata/RCON/PERF-983_CICO/results/75vusers -JGlobal_duration=2700 -JVUSERS=75 -JRAMP_UP=750 30vUsers - longevity nohup jmeter -n -t /home/ptf/testdata/RCON/PERF-983_CICO/circulation_checkInCheckOut_rcon.jmx -l rcon_30vusers_long.jtl -e -o /home/ptf/testdata/RCON/PERF-983_CICO/results/30vusers_long -JGlobal_duration=86400 -JVUSERS=30 -JRAMP_UP=300
Test CI/CO with 8, 20, 30, 75 concurrent users for 45 minutes each.
Test CI/CO with 30 users for 24 hours to detect any trends in memory.
To create widgets in AWS dashboard to monitor and collect CI/CO related modules parameters (service CPU and Memory) use these json:
To define the response times for requests that take longer than 100 milliseconds