FameFlower Test Results

Table of Contents

...

Jira Legacy

server	System Jira
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-40

Using the Carrier-io framework for capturing and analyzing performance test results, the following tests for the export instance UUIDs workflows were executed. The slowness depends on the number of records retrieved from the database, so we modeled the JMeter test scripts to make queries for a batch of keywords that return anywhere from 10K to 50K instances, and another batch of keywords that results in between 50K and 100K hits. Since these are resource-intensive operations, we did not test these scenarios with more than 5 concurrent users.

The tests were ran with 1 and 5 user with queries which return 10K~50K instances and 50K~100K instancesIn this workflow we are checking the performance of the loading items on instance record (in holdings accordion) workflow running in the Fameflower release. We will test it with 1, 5, 8, and 20 virtual users for 30 minutes. A longevity test will also be executed to see if there were memory issues.

Backend:
- mod-inventory-storage-19.1.2
- mod-inventory-14.1.3
- mod-authtoken-2.4.0
- mod-permissions-5.9.0
- okapi-2.38.0

Frontend:
- folio_inventory-2.0.2

Environment:

55 back-end modules deployed in 110 ECS services
3 okapi ECS services
8 m5.large EC2 instances
1 db.r5.xlarge AWS RDS instance
INFO logging level

High Level Summary

Overall check in, check out time in seconds

	Average (seconds)		50th %tile (seconds)		75th %tile (seconds)		95th %tile (seconds)
	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out	Check-in	Check-out
1 user	1.015	1.234	0.96	1.277	1.071	1.409	1.322	1.653
5 users	1.236	1.488	1.156	1.393	1.464	1.869	1.704	2.219
8 users	1.512	1.751	1.403	1.852	1.741	2.031	2.02	2.274
20 users	1.649	1.898	1.535	1.996	1.896	2.211	2.252	2.539

Slow APIs taking more than 100 ms to return

615 ms

API	1 user (75th %tile)	5 users (75th %tile)	8 users (75th %tile)	20 Users (75th %tile)
POST checkout-by-barcode	GET /source-storage/formattedRecords/{id}	5.24 s	905 ms	906 ms	988 msPOST checkin
GET_/inventory/instances keyword all "aba"
Get circulation/loans	POST_/circulation/check-out-by-barcode	548 ms	830 ms	1053 ms	1137 ms	barcode	283 ms	346 ms	449 ms	479 ms
Get inventory/items	217 ms	232 ms	237 ms	281 ms
GET_/inventory/instances/{id}
GET_/circulation/loans

Excess logging of missing indexes - 64K lines in 45 minutes run. Logging level could be reduced to WARNING or INFO, but at the cost of having less data to work with should there be a need to troubleshoot.
JVM profiling shows JSON de/serialization operations one of the slowest operations.

...

Test	Virtual Users	Duration	OKAPI log level	Profiled	Ramp up (total time in seconds)
1.	1	30 min	INFO	No	1
2.	5	30 min	INFO	No	50
3.	8	30 min	INFO	No	80
4.	20	30 min	INFO	No	200
5.	1	30 min	INFO	Yes	1
6.	5	30 min	INFO	Yes	50
7.	8	30 min	INFO	Yes	80
8.	20	30 min	INFO	Yes	200

...

Can drill down these methods for more information at : http://ec2-3-93-19-104.compute-1.amazonaws.com/grafana/d/b2iP-9hiz/java-profiling-folio?orgId=1&from=1589940195166&to=1589943457693&var-service_tag=mod-circulation&var-aggregation=10s

1. High level FameFlower results data

1 and 5 users tests runs, 10K~50K instances

Image Removed

Fig 1.1: The chart shows the overall high-level API stats obtained by JMeter calling various APIs in the save instance UUIDs worfklow. It breaks down average response times for 1 and 5 users tests per API call.

Image Removed

Fig 1.2: The charts below offer a clearer side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

A few things to note:

GET_/inventory/instances and GET_/instance-bulk/ids have the slowest response time. Have failed responses even for 1 user
The workflow with more than 100K records became non-responsive even with 1 user
The workflow with more than 5 users became unresponsive at times during the test runs, more on that below.

...

Image Removed

Fig 1.3 A side-by-side comparison for the 1 and 5 users tests runs, 10K~50K instances

A side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

Image Removed

Fig 1.4 side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

2. CPU Utilization

Image RemovedFig 2.1 CPU utilization percentage chart for 1 and 5 users test runs retrieving ~10K-50K instances

These services for the selected modules were chosen for their activity in the workflow and prominent values compared to other modules.

The test run with 5 users and 10K~50K instances

Image Removed

Fig 2.2: CPU utilization of various modules running on an EC2 instance, including mod-inventory-storage

Based on this CPU usage chart (Fig 2.2) we can make see that the service consumed the most CPU resources was mod-inventory-storage.

3. Memory Usage

The following data shows memory consumption of ECS services, most notably of mod-inventory-storage during the 5 users and 10K-50K instances retrieved test run.

Image Removed

Fig 3.1: mod-inventory-storage service consumed 99% of allocated RAM memory in the 5 users and 10K-50K instances retrieved test run.

Image Removed

Fig 3.1: mod-inventory-storage service consumed 99% of allocated RAM memory in the 5 users and 50K-100K instances retrieved test run.

Note that when turning on the profiler on mod-inventory-storage, the performance took a nose dive and mod-inventory-storage crashed a few times in a 30 minutes test run. Obviously this put more stress on the module (and no one would run the profiler in production) but it's consistent with observations in bug fest environment that when returning more records, mod-inventory-storage would experience OOM exception and crashed. What is seen here is mod-inventory-storage being at the edge of getting an OOM.

4. Disk IO

Image Removed

5. Database CPU Utilization

For 1 user - 30 min run
Image Removed

For 5 users - 30 min run

Image Removed
Database CPU usage was not high, but the queries took a long time to return, see below.

6. Database Slow queries

Slowest queries which took the most of execution time were initiated by the mod-inventory-storage service presented in the following table:

...

Percent of total time

...

Average Time,ms

...

Calls

...

Query

...

32%

...

10,796

...

15

SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8))) ORDER BY left(lower(f_unaccent(instance.jsonb->>$9)),$10), lower(f_unaccent(instance.jsonb->>$11))

...

23%

...

22,250

...

5

SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8)))) AND (to_tsvector($9, f_unaccent(instance.jsonb->>$10)) @@ replace((to_tsquery($11, f_unaccent($12)))::text, $13, $14)::tsquery) ORDER BY left(lower(f_unaccent(instance.jsonb->>$15)),$16), lower(f_unaccent(instance.jsonb->>$17))

...

13%

...

1,709

...

37

...

SELECT COUNT(*) FROM (SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8))) ORDER BY left(lower(f_unaccent(instance.jsonb->>$9)),$10), lower(f_unaccent(instance.jsonb->>$11)) LIMIT $12) x

...

12%

...

1,818

...

34

...

WITH headrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>$1)) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($2, f_unaccent(concat_space_sql(instance.jsonb->>$3 , concat_array_object_values(instance.jsonb->$4,$5) , concat_array_object_values(instance.jsonb->$6,$7)))) @@ (to_tsquery($8, f_unaccent($9)))) AND left(lower(f_unaccent(jsonb->>$10)),$11) < ( SELECT left(lower(f_unaccent(jsonb->>$12)),$13) FROM fs09000000_mod_inventory_storage.instance ORDER BY left(lower(f_unaccent(jsonb->>'title')),600) OFFSET $14 LIMIT $15 ) ORDER BY left(lower(f_unaccent(jsonb->>$16)),$17) LIMIT $18 OFFSET $19 ), allrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>$20)) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($21, f_unaccent(concat_space_sql(instance.jsonb->>$22 , concat_array_object_values(instance.jsonb->$23,$24) , concat_array_object_values(instance.jsonb->$25,$26)))) @@ (to_tsquery($27, f_unaccent($28)))) AND (SELECT COUNT(*) FROM headrecords) < $29 ) SELECT jsonb, title, $30 AS count FROM headrecords WHERE (SELECT COUNT(*) FROM headrecords) >= $31 UNION (SELECT jsonb, title, (SELECT COUNT(*) FROM allrecords) AS count FROM allrecords ORDER BY title LIMIT $32 OFFSET $33 ) ORDER BY title

...

4%

...

2,804

...

7

...

SELECT COUNT(*) FROM (SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8)))) AND (to_tsvector($9, f_unaccent(instance.jsonb->>$10)) @@ replace((to_tsquery($11, f_unaccent($12)))::text, $13, $14)::tsquery) ORDER BY left(lower(f_unaccent(instance.jsonb->>$15)),$16), lower(f_unaccent(instance.jsonb->>$17)) LIMIT $18) x

...

3%

...

1,865

...

9

...

EXPLAIN ANALYZE WITH headrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>'title')) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector('simple', f_unaccent(concat_space_sql(instance.jsonb->>'title' , concat_array_object_values(instance.jsonb->'contributors','name') , concat_array_object_values(instance.jsonb->'identifiers','value')))) @@ (to_tsquery('simple', f_unaccent('''english''')))) AND left(lower(f_unaccent(jsonb->>'title')),600) < ( SELECT left(lower(f_unaccent(jsonb->>'title')),600) FROM fs09000000_mod_inventory_storage.instance ORDER BY left(lower(f_unaccent(jsonb->>'title')),600) OFFSET 10000 LIMIT 1 ) ORDER BY left(lower(f_unaccent(jsonb->>'title')),600) LIMIT 100 OFFSET 0 ), allrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>'title')) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector('simple', f_unaccent(concat_space_sql(instance.jsonb->>'title' , concat_array_object_values(instance.jsonb->'contributors','name') , concat_array_object_values(instance.jsonb->'identifiers','value')))) @@ (to_tsquery('simple', f_unaccent('''english''')))) AND (SELECT COUNT(*) FROM headrecords) < 100 ) SELECT jsonb, title, 0 AS count FROM headrecords WHERE (SELECT COUNT(*) FROM headrecords) >= 100 UNION (SELECT jsonb, title, (SELECT COUNT(*) FROM allrecords) AS count FROM allrecords ORDER BY title LIMIT 100 OFFSET 0 ) ORDER BY title

7. Database Missing indexes

Image Removed

8. JVM Profiling result

JVM profiling of the most resources consuming okapi, mod-inventory, mod-inventory-storage service showed 3 methods which had a high CPU usage and impact on the overall service performance.

Image Removed

org.folio.okapi.managers.ProxyService.getModulesForRequest method uses most of CPU capacity which leads to performance degradation

Image Removed

org.apache.logging.log4j.spi.AbstractLogger.info method has high CPU usage with default INFO logging level

Image Removed

...

Appendix

See Attached FameFlower Performance Test Runs.xlsx for details

Version	Old Version 3	New Version 4
Changes made by	Svitlana Zmiivska	Svitlana Zmiivska
Saved on	May 28, 2020	May 28, 2020

Versions Compared

Key