Check-in-check-out Test Report (Ramsons) [non-ECS]
Test status: PASSED
Overview
- Regression testing of Check-In/Check-Out (CI/CO) fixed load tests on okapi based environment in Ramsons non-ECS release.
- The purposes of CI/CO testing:
- To define response times of transaction controllers for Check-In and Check-Out
- To find any trends for resource utilization and recommend improvements
- To check how system behaves over extended period during longevity test
- Compare results (current and previous
Summary
- Common results:
- CI/CO tests showed stable response times across low loads (8, 20 vUsers), a 50% increase under high load (75 vUsers), moderate degradation during longevity test, no memory leaks and up to 14% (80 ms) degradation in CO and up to 20% (100 ms) in CI 20 vUsers flow compared to Quesnelia release.
- Tests #1, #2, #3, #4
- Average response times in tests with 8, 20, 30 virtual users (vUsers) were the same in average. Average in 20 vUsers test CI - 482 ms, CO - 835 ms.
- Average response times in test with 75 vUsers response times grew +50% compared to 20 vUsers. CI - 606 ms, CO - 1100 ms.
- Test #5
- Average response times in test #5 with 30 vUsers. CI - 519 ms, CO - 1130 ms.. There's expected degradation during 24 hours test if to compare with 30 vUsers test #3. CI - 37%, CO - 14%.
- No memory leaks during longevity test. Two tests perform to get the results and both tests began erroring after 19 hours of running. The root course is under investigation.
- Comparison with Quesnelia results:
- CI/CO response times degradation (Tests #1, #2, #3, #4):
Test,# vUsers Check-Out Controller (CO) Check-In Controller (CI) 1 8 10% 15% 2 20 14% 20% 3 30 7% 7% 4 75 6% 4%
- CI/CO response times degraded (test #5 - longevity test):
- 30 vUsers - 6% in CO and 14% in CI flow.
- CI/CO response times degradation (Tests #1, #2, #3, #4):
Resources
- CPU utilization
- Tests #1, #2, #3, #4 and longevity tests used CPU in correspondence with number of vUsers but there are some modules that spiked during 75 vUsers test - mod-users-b - 132%, mod-authtoken-b - 98%, nginx-okapi - 84%.
- Memory consumption
- Tests #1, #2, #3, #4 and longevity tests didn't reveal some problems with memory usage by modules. So no memory leaks detected.
- RDS CPU utilization average
- 8 vUsers - 13%, 20 vUsers - 22%, 30 vUsers - 30%, 75 vUsers - 63% During longevity test CPU grew from 30% to 45%. So it has growing trend during longevity test. The same CPU utilization as it was in quesnelia.
- CPU (User) usage by broker
- As MSK cluster is linked to all PTF clusters so the time range which can reflect only CI/CO during longevity test (test #5) - from midnight till 7 a.m. Max consumption rate here - 10%. Also we may observe impact of other CI/CO tests - the max consumption rate - 40% for all clusters (tests #1, #2, #3, #4).
Recommendations & Jiras
- The previous results report:
- The current ticket: PERF-970 - [Ramsons] [non-ECS] CI/CO In Review
- mod-serials-management-b affect DB connection growth 200 connection in average. Disabling this module do not affect response times or error rate but significantly decrease DB connection number.
- Revisions for modules should have CPU=0 as a default value so this should be changed on module deployment level.
Test Runs
The following table contains tests configuration information
Test # | vUsers | Ramp-up, sec | Duration, sec |
1 | 8 | 80 | 2700 |
2 | 20 | 200 | 2700 |
3 | 30 | 300 | 2700 |
4 | 75 | 750 | 2700 |
5 | 30 | 300 | 86400 |
Results
Errors:
- Error messages: POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 422/Unprocessable Entity. Happen expectedly if instance was checked out already. Error rate - 0.002% which is acceptable.
Response time
The table contains results of Check-in, Check-out tests in Ramsons release.
Test #1, #2, #3, #4
8 vUsers (test #1) | 20 vUsers (test #2) | 30 vUsers (test #3) | 75 vUsers (test #4) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Requests | Response Times (ms) | Response Times (ms) | Response Times (ms) | Response Times (ms) | ||||||||
Label | #Samples | 95th pct | Average | #Samples | 95th pct | Average | #Samples | 95th pct | Average | #Samples | 95th pct | Average |
Check-Out Controller | 1888 | 1008.75 | 811 | 4496 | 1006.3 | 835.63 | 6588 | 954.55 | 822.01 | 14592 | 1422 | 1100.28 |
Check-In Controller | 1322 | 568.85 | 470 | 3386 | 559 | 482.65 | 4973 | 506 | 455.46 | 10971 | 775 | 606.04 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 1888 | 386 | 289.98 | 4498 | 368 | 302.89 | 6589 | 335 | 288.99 | 14595 | 519 | 384.38 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 1322 | 279.7 | 227.94 | 3394 | 273 | 234.69 | 4982 | 240 | 209.28 | 10991 | 369 | 277.67 |
GET_circulation/loans (Submit_barcode_checkout) | 1888 | 235.55 | 164.45 | 4496 | 216 | 162.68 | 6588 | 195 | 162.48 | 14592 | 319 | 218.14 |
Test #5
30 vUsers Longevity test | |||
---|---|---|---|
Requests | Samples, Response Times (ms) | ||
Label | #Samples | 95th pct | Average |
Check-Out Controller | 1888 | 1008.75 | 1130 |
Check-In Controller | 1322 | 568.85 | 519 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 1888 | 386 | 453.5 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 1322 | 279.7 | 331.15 |
Comparisons
This table has comparison between average values of response times of Ramsons and Quesnelia releases
8 vUsers (test #1) | 20 vUsers (test #2) | 30 vUsers (test #3) | 75 vUsers (test #4) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Requests | Response Times, milliseconds | |||||||||||||||
Quesnelia | Ramsons | Quesnelia | Ramsons | Quesnelia | Ramsons | Quesnelia | Ramsons | |||||||||
Label | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | ||||
Check-Out Controller | 741 | 811 | 70 | 9.45% | 729 | 835.63 | 106.63 | 14.63% | 767 | 822.01 | 55.01 | 7.17% | 1039 | 1100.28 | 61.28 | 5.90% |
Check-In Controller | 408 | 470 | 62 | 15.20% | 404 | 482.65 | 78.65 | 19.47% | 427 | 455.46 | 28.46 | 6.67% | 580 | 606.04 | 26.04 | 4.49% |
Comparison of longevity test
30 vUsers Longevity (test #5) | |||
---|---|---|---|
Response Times, milliseconds | |||
Quesnelia | Ramsons | ||
Average | Average | Delta,ms | Difference,% |
1065 | 1130 | 65 | 6.10% |
454 | 519 | 65 | 14.32% |
API requests where response times >= 100 milliseconds
API | 30 vUsers Ramsons Average, ms |
---|---|
POST checkout-by-barcode | 288 |
POST checkin-by-barcode | 209 |
GET circulation/loans | 162 |
Resources Utilization
CPU Utilization
During 45 minute tests CPU utilized mostly during high load (75 vUsers) by okapi - 84%, mod-authtoken spiked every 3 minutes from 5 to 30%, mod-inventory-storage - 23%, mod-inventory - 17%, mod-pubsub - 17%, nginx-okapi - 10%, mod-circulation - 10%, mod-circulation-storage - 3%
During longevity CPU utilized mostly by okapi - 37%, mod-authtoken spiked every 3 minutes from 5 to 20%, mod-inventory - 12%, mod-pubsub - 11%, mod-circulation - 5%, mod-circulation-storage - 3%
Tests #1, #2, #3, #4
During 45 minute tests CPU utilized mostly during high load (75 vUsers) by okapi - 84%, mod-authtoken spiked every 3 minutes from 5 to 30%, mod-inventory-storage - 23%, mod-inventory - 17%, mod-pubsub - 17%, nginx-okapi - 10%, mod-circulation - 10%, mod-circulation-storage - 3%
Test #5
During longevity CPU utilized mostly by okapi - 37%, mod-authtoken spiked every 3 minutes from 5 to 20%, mod-inventory - 12%, mod-pubsub - 11%, mod-circulation - 5%, mod-circulation-storage - 3%
Memory Consumption
Tests #1, #2, #3, #4 and test #5 didn't reveal some problems with memory usage by modules. So no memory leaks detected. Modules that consumed max memory - mod-search - 94%, mod-oa - 78%, mod-inventory - 72%, mod-dcb - 71%
Tests #1, #2, #3, #4
Memory usage during 75 vUsers CI/CO test: mod-inventory - 78%, mod-oa - 78%, mod-dcb - 60%, mod-data-import - 48%, okapi - 46%, mod-pubsub - 45%, mod-users - 39%, mod-search - 38%. The memory trend doesn't reveal some problems.
Test #5
Memory consumption trends during longevity show steady growth for pubsub module which stopped after test completed with 68%.
RDS CPU Utilization
RDS CPU utilized:
8 vUsers - 12%, 20 vUsers - 20%, 30 vUsers - 25%, 75 vUsers - 56% During longevity test CPU grew from 25% to 40%. So it has growing trend during longevity test. The RDS CPU utilization is the same as in quesnelia release.
Tests #1, #2, #3, #4
Test #5
RDS Database Connections
For 45 minute and longevity tests RDS used max 885-920 connections. Without test it was 860 connections.
Tests #1, #2, #3, #4
Test #5
CPU (User) usage by broker
As MSK cluster is linked to all PTF clusters so the time range which can reflect only CI/CO during longevity test (test #5) - from midnight till 7 a.m. Max consumption rate here - 10%. Also we may observe impact of other CI/CO tests - the max consumption rate - 40% for all clusters (tests #1, #2, #3, #4).
Tests #1, #2, #3, #4
Test #5
Database load
Tests #1, #2, #3, #4 UPDATE fs09000000_mod_inventory_storage.item SET jsonb=$1 WHERE id=$2 RETURNING jsonb::text INSERT INTO fs09000000_mod_pubsub.audit_message (id, event_id, event_type, tenant_id, audit_date, state, published_by, correlation_id, created_by, error_message) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10); WITH deleted_rows AS ( delete from marc_indexers mi where exists( select ? from marc_records_tracking mrt where mrt.is_dirty = ? and mrt.marc_id = mi.marc_id and mrt.version > mi.version ) returning mi.marc_id), deleted_rows2 AS ( delete from marc_indexers mi where exists( select ? from records_lb where records_lb.id = mi.marc_id and records_lb.state = ? ) returning mi.marc_id) INSERT IN SELECT fs09000000_mod_inventory_storage.count_estimate('SELECT * FROM fs09000000_mod_inventory_storage.material_type WHERE id=''025ba2c5-5e96-4667-a677-8186463aee69''') UPDATE fs09000000_mod_login.auth_attempts SET jsonb = $1::jsonb WHERE id='9883ca16-ef27-41f7-81d7-6693b79cddad' INSERT INTO fs09000000_mod_authtoken.refresh_tokens (id, user_id, is_revoked, expires_at) VALUES ($1, $2, $3, $4) SELECT upsert('circulation_logs', $1::uuid, $2::jsonb) Test #5 INSERT INTO fs09000000_mod_pubsub.audit_message (id, event_id, event_type, tenant_id, audit_date, state, published_by, correlation_id, created_by, error_message) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10); SELECT fs09000000_mod_patron_blocks.count_estimate('SELECT jsonb FROM fs09000000_mod_patron_blocks.patron_block_limits WHERE (jsonb->>''patronGroupId'') = ''5fc96cbd-a860-42a7-8d2b-72af30206712''') UPDATE fs09000000_mod_inventory_storage.item SET jsonb=$1 WHERE id=$2 RETURNING jsonb::text SELECT jsonb FROM fs09000000_mod_patron_blocks.user_summary WHERE (jsonb->>'userId') = '4cd01954-62da-46c5-8558-ebd222bc48eb' SELECT fs09000000_mod_inventory_storage.count_estimate('SELECT jsonb,id FROM fs09000000_mod_inventory_storage.service_point WHERE id=''7068e104-aa14-4f30-a8bf-71f71cc15e07''') UPDATE fs09000000_mod_login.auth_attempts SET jsonb = $1::jsonb WHERE id='9883ca16-ef27-41f7-81d7-6693b79cddad' INSERT INTO fs09000000_mod_authtoken.refresh_tokens (id, user_id, is_revoked, expires_at) VALUES ($1, $2, $3, $4) SELECT upsert('circulation_logs', $1::uuid, $2::jsonb) SELECT COUNT(*) FROM fs09000000_mod_users.users SELECT fs09000000_mod_circulation_storage.count_estimate('SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan_policy WHERE id=''2be97fb5-eb89-46b3-a8b4-776cea57a99e''')
During 45 minute tests (#1, #2, #3, #4) we see that the longest request is UPDATE fs09000000_mod_inventory_storage.item SET with 38 ms/request
During longevity test (#5) INSERT INTO fs09000000_mod_pubsub.audit_message - 41 ms and SELECT fs09000000_mod_inventory_storage.count_estimate - 107 ms.
Other observation is that we see a lot of UPDATE fs09000000_mod_login.auth_attempts and INSERT INTO fs09000000_mod_authtoken.refresh_tokens which is new. It may be connected to every 10 minutes token refresh.
Tests #1, #2, #3, #4
Test #5
Appendix
Infrastructure
PTF -environment rcp1 |
---|
|
DB table records size:
|
---|
Modules
Methodology/Approach
Description
Testing includes data preparation step and testing itself
- Data preparation for each test takes up to 20 minutes and consists of truncating involved in testing tables, populating data and updating statuses of items.
- Test itself depends on duration and virtual users number creating necessary load.
In Ramsons token expiration set to 10 minutes by default so to run any tests use new login implementation from the script. Pay attention to Backend Listener. Replace value of application parameter to make the results visible in Grafana dashboard.
Module configuration recommended setup
Update revision in source-record-storage module to exclude every 30 minutes SQL statements - delete rows in marc_indexers
(mi
) WITH deleted_rows
{ "name": "srs.marcIndexers.delete.interval.seconds", "value": "86400" },
Update mod-serials module. Set number of task with 0 to exclude significant database connection growth.
DB trigger setup in Ramsons
Usual PTF CI/CO data preparation script won’t work in Ramsons. To solve that disable trigger updatecompleteupdateddate_item_insert_update before data preparation for the tenant and enable it before test start.
The sql file was updated to do that step from the script.
Data preparation
First step
- To prepare data establish connection by AWS keys, then run from bash .sql script first (take from the title of code block and replace [PASSWORD] with correct password.
-- Disable trigger ALTER TABLE fs09000000_mod_inventory_storage.item DISABLE TRIGGER updatecompleteupdateddate_item_insert_update; TRUNCATE TABLE fs09000000_mod_patron_blocks.user_summary; TRUNCATE TABLE fs09000000_mod_circulation_storage.loan; TRUNCATE TABLE fs09000000_mod_circulation_storage.audit_loan; TRUNCATE TABLE fs09000000_mod_circulation_storage.request; TRUNCATE TABLE fs09000000_mod_circulation_storage.patron_action_session; TRUNCATE TABLE fs09000000_mod_circulation_storage.scheduled_notice; TRUNCATE TABLE fs09000000_mod_notify.notify_data; UPDATE fs09000000_mod_inventory_storage.item SET jsonb = jsonb_set(jsonb, '{status, name}', '"Available"') WHERE jsonb->'status'->>'name' != 'Available'; UPDATE fs09000000_mod_users.users SET jsonb = jsonb_set(jsonb, '{active}', '"true"') WHERE jsonb->'active' != 'true'; -- Enable trigger ALTER TABLE fs09000000_mod_inventory_storage.item ENABLE TRIGGER updatecompleteupdateddate_item_insert_update;
Second step
- Run command from scripts folder uploaded to S3 bucket ./circ-data-load.sh psql_rcp1.conf [tenant] - where replace [tenant] with tenant Id, change parameters in psql_rcp1.conf file with valid data.
- Troubleshooting:
- If the command executed from local machine you may encounter with too long query error message. To solve it use PGAdmin to run 2 long queries UPDATE ${TENANT}_mod_inventory_storage.item SET jsonb = jsonb_set(jsonb, '{status, name}', '\"Checked out\"') where id IN.
- Other possible issue - incorrect encoding (on Windows machine). To solve it just add ENCODING 'UTF8'
- Use pattern: copy ${TENANT}_mod_circulation_storage.loan(id, jsonb) FROM '${LOANS}' DELIMITER E'\t' ENCODING 'UTF8'
Use .jmx file script for Ramson release. If any changes were made then upload the artefacts to S3 bucket and AWS instance load generator.
To start test from AWS instance (load generator) use template for the command. Test locally before start.
8 vUsers - nohup jmeter -n -t /home/ptf/testdata/RCP1/CICO/circulation_checkInCheckOut_rcp1.jmx -l rcp1_8vUsers.jtl -e -o /home/ptf/testdata/RCP1/CICO/results/8vUsers -JGlobal_duration=2700 -JCICO_vusers=8 -JCICO_rampup=80 20vUsers - nohup jmeter -n -t /home/ptf/testdata/RCP1/CICO/circulation_checkInCheckOut_rcp1.jmx -l rcp1_20vUsers.jtl -e -o /home/ptf/testdata/RCP1/CICO/results/20vUsers -JGlobal_duration=2700 -JCICO_vusers=20 -JCICO_rampup=200 30vUsers - nohup jmeter -n -t /home/ptf/testdata/RCP1/CICO/circulation_checkInCheckOut_rcp1.jmx -l rcp1_30vUsers.jtl -e -o /home/ptf/testdata/RCP1/CICO/results/30vUsers -JGlobal_duration=2700 -JCICO_vusers=30 -JCICO_rampup=300 75vUsers - nohup jmeter -n -t /home/ptf/testdata/RCP1/CICO/circulation_checkInCheckOut_rcp1.jmx -l rcp1_75vUsers.jtl -e -o /home/ptf/testdata/RCP1/CICO/results/75vUsers -JGlobal_duration=2700 -JCICO_vusers=75 -JCICO_rampup=750
Test CI/CO with 8, 20, 30, 75 concurrent users for 45 minutes each.
Test CI/CO with 30 users for 24 hours to detect any trends in memory.
To create widgets in AWS dashboard to monitor and collect CI/CO related modules parameters (service CPU and Memory) use these json:
File with raw data
Use the file to get raw data and comparison tables
To define the response times for requests that take longer than 100 milliseconds