Table of Contents |
---|
...
Ticket:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Summary
- Load tests showed that ListApp refreshing duration isTests showed the Lists App refresh of concurrent lists on 3 tenants are:
- 1.5 min mins for 3 users concurrent lists refresh test (1 user list refresh on each of three tenantstenant);
- 2.3 min mins for 10 users concurrent lists refresh test (3-4 user lists refresh on each of three tenantstenant).
- Load During load test for 30 users lists (10 users lists per tenant) failed due to DB overload (100% of refresh transactions failed)some of the list refreshes failed. After the test end "isRefreshing" status remained "true" for each list. It was reset manually directly through the database. Details on the issue can be found here: Failed list refresh investigation.
- During the 10 users lists test CPU utilization reached 200% for mod-fqm-manager and 111% for mod-lists. Also, mod-permissions' CPU utilization exceeded 100% during 30 users lists test.
- Maximum DB CPU utilization reached 83% (writer instance) and 99% (reader instance) during the 30 user test. lists test. In comparison with testing with R/W split disabled, RDS CPU utilization didn't decrease when DB R/W split was enabled.
- Memory utilization for mod-permissions increased from 48% to 76% during the tests. No memory leak is suspected for all the modules.
Recommendations & Jiras
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Test runs
Query used in lists - "Item status != Available". List refresh result is about 200K records.
Scenario | Data quantity |
---|---|
List App refresh multiple tenants | tenant 1 - 1 userlist tenant 2 - 1 userlist tenant 3 - 1 userlist |
tenant 1 - 3 userslists tenant 2 - 3 userslists tenant 3 - 4 userslists | |
tenant 1 - 10 userslists tenant 2 - 10 userslists tenant 3 - 10 userslists |
Results
Transaction | Duration, avg | Release | Tenants | Number of userslists | R/W split | Other conditions |
---|---|---|---|---|---|---|
ListApp Lists App refresh | 10 min 40 sec | [Orchid] | 1 tenant | 10 users10 | disabled | |
8.5 min | [Poppy] | 1 tenant | 10 users10 | disabled | ||
17.7 min | [Poppy] | 1 tenant | 10 users10 | disabled | Testing in parallel with DI and CICO | |
ListApp Lists App refresh current test results** | 1.5 min | [Poppy] | 3 tenants3 users | 3 | enabled | |
2.3 min | [Poppy] | 3 tenants | 10 users10 | enabled | ||
error | [Poppy] | 3 tenants | 30 users30 | enabled100% | of refresh transactions failed 1 list refresh failed for the 1st tenant 8 list refreshes failed for the 3rd tenant*** |
* Query used in lists - "Item status == Checked out". List refresh result is 200K records. Results are taken from previous test reports: [Poppy] List App + with multiple workflows test report and R/W split disabled test report, [Orchid] List App test report
**Query used in lists - "Item status != Available". List refresh result is about 200K records.
***Details on the issue can be found at Failed list refresh investigation part
Instance CPU Utilization
Service CPU Utilization
During the 10 users -lists test CPU utilization reached 200% for mod-fqm-manager and 111% for mod-lists. Also, mod-permissions CPU utilization exceeded 100% during 30 users lists test.
Memory Utilization
Memory utilization for mod-permissions increased from 48% to 76% during the tests. No memory leak is suspected for all the modules.
...
Maximum DB CPU utilization reached 83% (writer instance) and 99% (reader instance) during 30 user lists test.
In comparison with testing with R/W split disabled, RDS CPU utilization for the writer node decreased from 70% to 29% for 3 users test, from 95% to 68% for 10 users test. At the same time CPU load on the reader node is higher than on the writer.
Results for multiple tenants and R/W split disabled testing (for comparison):
Details can be found here: [Poppy] List App with multiple tenants and R/W split disabled
DB Connections
DB Load
Writer DB node
...
Code Block | ||
---|---|---|
| ||
delete from list_contents where list_id=$1 and refresh_id=$2 |
Failed list refresh investigation
In order to investigate the issue there were conducted addtional tests for single tenant (30 lists in parallel) and multiple tenants (10 lists on each of three tenants in parallel).
Test | Failed list refresh | |
---|---|---|
1 tenant (30 lists in parallel) | 3 tenants (10 lists on each in parallel) | |
1 | 1 list failed | 2 lists failed for the 1st tenant 10 lists failed for the 3rd tenant |
2 | 1 list failed | 1 lists failed for the 1st tenant 8 lists failed for the 3rd tenant |
3 | 2 lists failed | 2 lists failed for the 1st tenant 8 lists failed for the 3rd tenant |
! For the second tenant there were no failed list refreshes during all the tests.
! After the test end "isRefreshing" status remained "true" for each of the failed lists. It was reset manually directly through the database.
Log messages
"Reason for failure: org.springframework.dao.DataIntegrityViolationException: could not execute batch..."
"ERROR: insert or update on table "list_contents_11" violates foreign key constraint "fk_refresh_id""
Code Block | ||
---|---|---|
| ||
07:24:29 [] [] [] [] ERROR efreshFailedCallback Refresh failed for list 275ffe0d-a7aa-4971-8649-05dfd5fabcb8, refreshId 2604f768-8365-4cd1-8d44-42ac2eb4b0cb. Reason for failure: org.springframework.dao.DataIntegrityViolationException: could not execute batch [Batch entry 0 insert into list_contents (sort_seq,content_id,list_id,refresh_id) values (0,'00003218-d1c0-4e28-a1f9-7ff1b876adab'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(1,'00006fb1-e36f-428e-a0d6-783a3cad91d8'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(2,'0000b30e-b794-4fc8-af62-dd769232f958'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(3,'0000c889-ad4f-42b2-8f29-70ed0ab09904'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(4,'00011866-fe22-409c-bbf6-f0fd6a8d4580'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(5,'00012a5c-28c5-40c1-9728-ce66e867bc35'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(6,'00012af0-33fc-4323-aa75-91e0c2d4174b'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(7,'0002346f-e359-42ce-b69c-df35ad8b9b76'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(8,'000251b9-c33d-4e3a-9a36-33a994a9f128'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(9,'00035926-2c06-4616-975a-b3908bb24346'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(10,'0003c0a2-432d-41ec-8847-e757f50f6d4d'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(11,'000410ba-83fb-43ae-bf2e-20878e27ef23'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(12,'0004a65d-fb57-4754-9d36-23072986b9ed'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(13,'00066938-046f-4f50-8bfb-522a0aa4d0b1'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(14,'0006939e-ce6f-4876-8db9-0032e4cb41c2'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(15,'00069719-c84e-4436-b45b-6999b8459742'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(16,'0006f3f3-2fd2-4be0-80c5-8a6998b9dd78'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(17,'00070833-4f16-4aeb-824b-2a8ef4527bc9'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(18,'00076065-3bb3-462c-81b4-a28ae075f31d'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(19,'00080f89-b119-47b1-ba9c-73fdbca5873b'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(20,'0008c6d6-a51a-48ed-88f2-6063e7f621e2'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(21,'0009200f-056e-42cc-bec7-8d49dbf95e1c'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(22,'00097224-3228-4dc5-bce1-4a0cda755ed1'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(23,'000977de-102b-4430-ab53-5f256654da93'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(24,'000a05bc-cd51-49ab-a07c-e7d896c44404'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(25,'000ac6a6-ef1f-4ea9-9faf-d1a03fafe66f'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(26,'000ae987-f93b-41d7-9125-d9ce7c51cb4f'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(27,'000b149d-4600-4839-ad27-44d012136a05'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(28,'000b5e84-c840-42e5-885a-e26185f5866c'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(29,'000bb99a-2cb2-4d18-9441-d98b0e505346'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(30,'000c7f25-659a-4584-9333-4c9543bb6ac9'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid),(31,'000cad02-dbf1-4cee-ab9b-62cdc3242a9e'::uuid,'275ffe0d-a7aa-4971-8649-05dfd5fabcb8'::uuid,'2604f768-8365-4cd1-8d44-42ac2eb4b0cb'::uuid) was aborted: ERROR: insert or update on table "list_contents_11" violates foreign key constraint "fk_refresh_id"
|
"Key (refresh_id)=(2604f768-8365-4cd1-8d44-42ac2eb4b0cb) is not present in table "list_refresh_details""
Code Block | ||
---|---|---|
| ||
Detail: Key (refresh_id)=(2604f768-8365-4cd1-8d44-42ac2eb4b0cb) is not present in table "list_refresh_details". Call getNextException to see other errors in the batch.] [insert into list_contents (sort_seq,content_id,list_id,refresh_id) values (?,?,?,?)]; SQL [insert into list_contents (sort_seq,content_id,list_id,refresh_id) values (?,?,?,?)]; constraint [fk_refresh_id] |
Other log messages can be found in the faile attached:
View file | ||||
---|---|---|---|---|
|
RDS CPU
Slow queries
Reader node:
Code Block | ||
---|---|---|
| ||
select id from fs09000000_mod_fqm_manager.drv_item_callnumber_location where lower(cast(item_status as varchar)) <> lower($1)
parameters: $1 = 'Available' |
Code Block | ||
---|---|---|
| ||
select result_id from fs07000001_mod_fqm_manager.query_results where query_id = cast($1 as uuid) order by result_id offset $2 rows fetch next $3 rows only
parameters: $1 = '53b7d3be-df3c-4545-abf8-aded0a0b0a87', $2 = '92000', $3 = '1000' |
Writer node:
Code Block | ||
---|---|---|
| ||
SELECT jsonb FROM fs07000001_mod_permissions.permissions |
Code Block | ||
---|---|---|
| ||
delete from list_contents where list_id=$1 and refresh_id=$2
parameters: $1 = '06831116-06d0-49d1-8002-f030e9764020', $2 = 'ba886eaa-67c2-4e5f-8f06-8d383b497967' |
Code Block |
---|
insert into list_contents (sort_seq,content_id,list_id,refresh_id) values ($1,$2,$3,$4),($5,$6,$7,$8),($9,$10,$11,$12),($13,$14,$15,$16),($17,$18,$19,$20),($21,$22,$23,$24),($25,$26,$27,$28),($29,$30,$31,$32),($33,$34,$35,$36),($37,$38,$39,$40),($41,$42,$43,$44),($45,$46,$47,$48),($49,$50,$51,$52),($53,$54,$55,$56),($57,$58,$59,$60),($61,$62,$63,$64) |
Appendix
Infrastructure
PTF -environment pcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK tenant
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...
- Enable R/W split for mod-fqm-manager.
- Create 10 Create 10 lists with the query "Item status != Available" on each of three tenants to be able to run a test for up to 30 concurrent lists (users).
- Prepare 200K item records for the query to return. Details can be found at the link: Steps for testing process#ListApp
- Conduct tests with JMeter script for multiple tenants.
...