Test status: PASSED
Overview
- Regression testing of Check-In/Check-Out (CI/CO) fixed load tests on okapi based environment in quesnelia release.
- The flows triggered by the command from AWS instance load generator. All artefacts needed to run tests tested on local machine first.
- There're minor changes in script to process CI/CO and handle token expiration.
- Testing includes data preparation step and testing itself
- Data preparation for each test takes up to 20 minutes and consists of truncating involved in testing tables, populating data and updating statuses of items.
- Test itself depends on duration and virtual users number creating necessary load.
- The purposes of CI/CO testing:
- To define response times of transaction controllers such as Check-In Controller and Check-Out Controller
- To define response times of requests that last more than 100 ms
- To find any trends for resource utilization and recommend improvements
- Compare results (current and previous)
Jiras/ links:
- The previous results report:
- The current ticket: PERF-970 - [Ramsons] [non-ECS] CI/CO In Review
Summary
- Common results:
- CI/CO tests showed stable response times across low loads, a 50% increase under high load (75 vUsers), moderate degradation during longevity test, no memory leaks and up to 14% (80 ms) degradation in CO and up to 20% (100 ms) in CI 20 vUsers flow compared to Quesnelia release.
- 45 minute tests
- Response times in tests with 8, 20, 30 virtual users (vUsers) were the same in average. Average in 20 vUsers test CI - 482 ms, CO - 835 ms.
- Response times in test with 75 vUsers response times grew +50% compared to 20 vUsers. CI - 606 ms, CO - 1100 ms.
- Longevity test
- Response times in test with 30 vUsers. CI - 519 ms, CO - 1130 ms.. There's expected degradation during 24 hours test if to compare with 30 vUsers 45 minutes test. CI - 37%, CO - 14%.
- No memory leaks during longevity test. Two tests perform to get the results and both tests began erroring after 19 hours of running.
- Comparison with Quesnelia results:
- CI/CO response times degradation (45 minute tests):
vUsers Check-Out Controller (CO) Check-In Controller (CI) 8 10% 15% 20 14% 20% 30 7% 7% 75 6% 4%
- CI/CO response times degraded (longevity test):
- 30 vUsers - 6% in CO and 14% in CI flow.
- CI/CO response times degradation (45 minute tests):
Resources
- CPU utilization
- 45 minute and longevity tests used CPU in correspondence with number of vUsers but there are some modules that spiked during 75 vUsers test - mod-users-b - 132%, mod-authtoken-b - 98%, nginx-okapi - 84%.
- Memory consumption
- 45 minute and longevity tests didn't reveal some problems with memory usage by modules. So no memory leaks detected.
- RDS CPU utilization average
- 8 vUsers - 13%, 20 vUsers - 22%, 30 vUsers - 30%, 75 vUsers - 63% During longevity test CPU grew from 30% to 45%. So it has growing trend during longevity test that can be explained by absent dcb-system-user in mod-dcb module.
- CPU (User) usage by broker
- Common CPU utilization by broker during all tests was 15% with equal distribution between brokers
Common notes
Recommendations
Test Runs
The following table contains tests configuration information
Test # | vUsers | Ramp-up, sec | Duration, sec |
1 | 8 | 80 | 2700 |
2 | 20 | 200 | 2700 |
3 | 30 | 300 | 2700 |
4 | 75 | 750 | 2700 |
5 | 30 | 300 | 86400 |
Results
Errors:
- Error messages: POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 422/Unprocessable Entity. Happen expectedly if instance was checked out already. Error rate - 0.002% which is acceptable.
Response time
The table contains results of Check-in, Check-out tests in Ramsons release.
45 minute tests
8 vUsers | 20 vUsers | 30 vUsers | 75 vUsers | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Requests | Response Times (ms) | Response Times (ms) | Response Times (ms) | Response Times (ms) | ||||||||
Label | #Samples | 95th pct | Average | #Samples | 95th pct | Average | #Samples | 95th pct | Average | #Samples | 95th pct | Average |
Check-Out Controller | 1888 | 1008.75 | 811 | 4496 | 1006.3 | 835.63 | 6588 | 954.55 | 822.01 | 14592 | 1422 | 1100.28 |
Check-In Controller | 1322 | 568.85 | 470 | 3386 | 559 | 482.65 | 4973 | 506 | 455.46 | 10971 | 775 | 606.04 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 1888 | 386 | 453.5 | 4498 | 368 | 302.89 | 6589 | 335 | 288.99 | 14595 | 519 | 384.38 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 1322 | 279.7 | 331.15 | 3394 | 273 | 234.69 | 4982 | 240 | 209.28 | 10991 | 369 | 277.67 |
GET_circulation/loans (Submit_barcode_checkout) | 1888 | 164 | 235 | 4496 | 162 | 216 | 6588 | 162 | 195 | 14592 | 218 | 319 |
Longevity test
30 vUsers Longevity | |||
---|---|---|---|
Requests | Samples, Response Times | ||
Label | #Samples | 95th pct | Average |
Check-Out Controller | 1888 | 1008.75 | 1130 |
Check-In Controller | 1322 | 568.85 | 519 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 1888 | 386 | 453.5 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 1322 | 279.7 | 331.15 |
Comparisons
This table has comparison between average values of response times of Ramsons and Quesnelia releases
8 vUsers | 20 vUsers | 30 vUsers | 75 vUsers | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Requests | Response Times | Response Times | Response Times | Response Times | ||||||||||||
Quesnelia | Ramsons | Quesnelia | Ramsons | Quesnelia | Ramsons | Quesnelia | Ramsons | |||||||||
Label | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | Average | Delta,ms | Difference,% | ||||
Check-Out Controller | 741 | 811 | 70 | 9.45% | 729 | 835.63 | 106.63 | 14.63% | 767 | 822.01 | 55.01 | 7.17% | 1039 | 1100.28 | 61.28 | 5.90% |
Check-In Controller | 408 | 470 | 62 | 15.20% | 404 | 482.65 | 78.65 | 19.47% | 427 | 455.46 | 28.46 | 6.67% | 580 | 606.04 | 26.04 | 4.49% |
Comparison of longevity test
30 vUsers Longevity | |||
---|---|---|---|
Response Times | |||
Quesnelia | Ramsons | ||
Average | Average | Delta,ms | Difference,% |
1065 | 1130 | 65 | 6.10% |
454 | 519 | 65 | 14.32% |
API requests where response times >= 100 ms
API | 30 vUsers Ramsons Average, ms |
---|---|
POST checkout-by-barcode | 288 |
POST checkin-by-barcode | 209 |
GET circulation/loans | 162 |
Resources Utilization
CPU Utilization
During 45 minute tests CPU utilized mostly during high load (75 vUsers) by okapi - 84%, mod-authtoken spiked every 3 minutes from 5 to 30%, mod-inventory-storage - 23%, mod-inventory - 17%, mod-pubsub - 17%, nginx-okapi - 10%, mod-circulation - 10%, mod-circulation-storage - 3%
During longevity CPU utilized mostly by okapi - 37%, mod-authtoken spiked every 3 minutes from 5 to 20%, mod-inventory - 12%, mod-pubsub - 11%, mod-circulation - 5%, mod-circulation-storage - 3%
45 minute tests
During 45 minute tests CPU utilized mostly during high load (75 vUsers) by okapi - 84%, mod-authtoken spiked every 3 minutes from 5 to 30%, mod-inventory-storage - 23%, mod-inventory - 17%, mod-pubsub - 17%, nginx-okapi - 10%, mod-circulation - 10%, mod-circulation-storage - 3%
Longevity test
During longevity CPU utilized mostly by okapi - 37%, mod-authtoken spiked every 3 minutes from 5 to 20%, mod-inventory - 12%, mod-pubsub - 11%, mod-circulation - 5%, mod-circulation-storage - 3%
Memory Consumption
45 minute and longevity tests didn't reveal some problems with memory usage by modules. So no memory leaks detected. Modules that consumed max memory - mod-search - 94%, mod-oa - 78%, mod-inventory - 72%, mod-dcb - 71%
45 minute tests
Memory usage during 75 vUsers CI/CO test: mod-inventory - 78%, mod-oa - 78%, mod-dcb - 60%, mod-data-import - 48%, okapi - 46%, mod-pubsub - 45%, mod-users - 39%, mod-search - 38%. The memory trend doesn't reveal some problems.
Longevity test
Memory consumption trends during longevity show steady growth for pubsub module which stopped after test completed with 68%.
RDS CPU Utilization
RDS CPU utilized:
8 vUsers - 12%, 20 vUsers - 20%, 30 vUsers - 25%, 75 vUsers - 56% During longevity test CPU grew from 25% to 40%. So it has growing trend during longevity test. The RDS CPU utilization is the same as in quesnelia release.
45 minute tests
Longevity test
RDS Database Connections
For 45 minute and longevity tests RDS used max 885-920 connections. Without test it was 860 connections.
45 minute tests
Longevity test
CPU (User) usage by broker
As MSK cluster is linked to all PTF clusters so the time range which can reflect only CI/CO - midnight till 7 a.m. Max consumption rate for 30 vUsers test - 10%. Also we may observe impact of other CI/CO tests - the max consumption rate - 40% for all clusters.
45 minute tests
Longevity test
Database load
45 minutes tests UPDATE fs09000000_mod_inventory_storage.item SET jsonb=$1 WHERE id=$2 RETURNING jsonb::text INSERT INTO fs09000000_mod_pubsub.audit_message (id, event_id, event_type, tenant_id, audit_date, state, published_by, correlation_id, created_by, error_message) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10); WITH deleted_rows AS ( delete from marc_indexers mi where exists( select ? from marc_records_tracking mrt where mrt.is_dirty = ? and mrt.marc_id = mi.marc_id and mrt.version > mi.version ) returning mi.marc_id), deleted_rows2 AS ( delete from marc_indexers mi where exists( select ? from records_lb where records_lb.id = mi.marc_id and records_lb.state = ? ) returning mi.marc_id) INSERT IN SELECT fs09000000_mod_inventory_storage.count_estimate('SELECT * FROM fs09000000_mod_inventory_storage.material_type WHERE id=''025ba2c5-5e96-4667-a677-8186463aee69''') UPDATE fs09000000_mod_login.auth_attempts SET jsonb = $1::jsonb WHERE id='9883ca16-ef27-41f7-81d7-6693b79cddad' INSERT INTO fs09000000_mod_authtoken.refresh_tokens (id, user_id, is_revoked, expires_at) VALUES ($1, $2, $3, $4) SELECT upsert('circulation_logs', $1::uuid, $2::jsonb) Longevity INSERT INTO fs09000000_mod_pubsub.audit_message (id, event_id, event_type, tenant_id, audit_date, state, published_by, correlation_id, created_by, error_message) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10); SELECT fs09000000_mod_patron_blocks.count_estimate('SELECT jsonb FROM fs09000000_mod_patron_blocks.patron_block_limits WHERE (jsonb->>''patronGroupId'') = ''5fc96cbd-a860-42a7-8d2b-72af30206712''') UPDATE fs09000000_mod_inventory_storage.item SET jsonb=$1 WHERE id=$2 RETURNING jsonb::text SELECT jsonb FROM fs09000000_mod_patron_blocks.user_summary WHERE (jsonb->>'userId') = '4cd01954-62da-46c5-8558-ebd222bc48eb' SELECT fs09000000_mod_inventory_storage.count_estimate('SELECT jsonb,id FROM fs09000000_mod_inventory_storage.service_point WHERE id=''7068e104-aa14-4f30-a8bf-71f71cc15e07''') UPDATE fs09000000_mod_login.auth_attempts SET jsonb = $1::jsonb WHERE id='9883ca16-ef27-41f7-81d7-6693b79cddad' INSERT INTO fs09000000_mod_authtoken.refresh_tokens (id, user_id, is_revoked, expires_at) VALUES ($1, $2, $3, $4) SELECT upsert('circulation_logs', $1::uuid, $2::jsonb) SELECT COUNT(*) FROM fs09000000_mod_users.users SELECT fs09000000_mod_circulation_storage.count_estimate('SELECT jsonb,id FROM fs09000000_mod_circulation_storage.loan_policy WHERE id=''2be97fb5-eb89-46b3-a8b4-776cea57a99e''')
During 45 minute tests we see that the longest request is UPDATE fs09000000_mod_inventory_storage.item SET with 38 ms/request
During longevity test INSERT INTO fs09000000_mod_pubsub.audit_message - 41 ms and SELECT fs09000000_mod_inventory_storage.count_estimate - 107 ms.
Other observation is that we see a lot of UPDATE fs09000000_mod_login.auth_attempts and INSERT INTO fs09000000_mod_authtoken.refresh_tokens which is new. It may be connected to every 10 minutes token refresh.
45 minute tests
Longevity test
Appendix
Infrastructure
PTF -environment rcp1 |
---|
|
DB table records size:
|
---|
Modules
Methodology/Approach
Update revision in source-record-storage module to exclude every 30 minutes SQL statements - delete rows in marc_indexers
(mi
) WITH deleted_rows
{ "name": "srs.marcIndexers.delete.interval.seconds", "value": "86400" },
Update mod-serials module. Set number of task with 0 to exclude significant database connection growth.
Usual PTF CI/CO data preparation script won’t work in Ramsons. To solve that disable trigger updatecompleteupdateddate_item_insert_update before data preparation for the tenant and enable it before test start.
To prepare data establish connection by AWS keys, then run from bash .sql script first (take from the title of code block and replace [PASSWORD] with correct password.
TRUNCATE TABLE fs09000000_mod_patron_blocks.user_summary; TRUNCATE TABLE fs09000000_mod_circulation_storage.loan; TRUNCATE TABLE fs09000000_mod_circulation_storage.audit_loan; TRUNCATE TABLE fs09000000_mod_circulation_storage.request; TRUNCATE TABLE fs09000000_mod_circulation_storage.patron_action_session; TRUNCATE TABLE fs09000000_mod_circulation_storage.scheduled_notice; TRUNCATE TABLE fs09000000_mod_notify.notify_data; UPDATE fs09000000_mod_inventory_storage.item SET jsonb = jsonb_set(jsonb, '{status, name}', '"Available"') WHERE jsonb->'status'->>'name' != 'Available'; UPDATE fs09000000_mod_users.users SET jsonb = jsonb_set(jsonb, '{active}', '"true"') WHERE jsonb->'active' != 'true';
Second part of data preparation: run command ./circ-data-load.sh psql_rcp1.conf [tenant] - where replace [tenant] with tenant Id, change parameters in psql_rcp1.conf file with valid data.
Troubleshooting:
- If the command executed from local machine you may encounter with too long query error message. To solve it use PGAdmin to run 2 long queries UPDATE ${TENANT}_mod_inventory_storage.item SET jsonb = jsonb_set(jsonb, '{status, name}', '\"Checked out\"') where id IN.
- Other possible issue - incorrect encoding (on Windows machine). To solve it just add ENCODING 'UTF8'
- Use pattern: copy ${TENANT}_mod_circulation_storage.loan(id, jsonb) FROM '${LOANS}' DELIMITER E'\t' ENCODING 'UTF8'
In Ramsons token expiration set to 10 minutes by default so to run any tests use new login implementation from the script. Pay attention to Backend Listener. Replace value of application to make the results visible in Grafana dashboard.
Update .jmx file script for Ramson release and upload the artefacts to S3 bucket and AWS instance load generator.
Test CI/CO with 8, 20, 30, 75 concurrent users for 45 minutes each.
Test CI/CO with 30 users for 24 hours to detect any trends in memory.
Create widgets in AWS dashboard to monitor and collect CI/CO related modules parameters (service CPU and Memory):
Use the file to get raw data and comparison tables