Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-28

Backend:
- mod-inventory-storage-19.1.2
- mod-inventory-14.1.3
- mod-authtoken-2.4.0
- mod-permissions-5.9.0
- okapi-2.38.0

Frontend:
- folio_inventory-2.0.2

Folio build was deployed with 50+ ECS services distributed randomly across four m5.large EC2 instances and the database was created on the db.r5.xlarge AWS RDS instance. Logging level was set to default INFO.

Test Runs

30-min Runs for export instance UUIDs workflow:

...

Test

...

Virtual Users

...

Duration

...

OKAPI log level

...

Profiled

...

Ramp up (total time

in seconds)

...

Size of response (how many instances

were returned)

...

1. FameFlower

...

1

...

30 min

...

INFO

...

No

...

5

...

10K~50K instances

...

2. FameFlower

...

1

...

30 min

...

INFO

...

No

...

1

...

50K~100K instances

...

3. FameFlower

...

1

...

30 min

...

INFO

...

Yes

...

10

...

10K~50K instances

...

4. FameFlower

...

1

...

30 min

...

INFO

...

Yes

...

10

...

50K~100K instances

...

5. FameFlower

...

5

...

30 min

...

INFO

...

No

...

50

...

10K~50K instances

...

6. FameFlower

...

High Level Summary

Most of failed requests were related to GET_/inventory/instances and GET_/instance-bulk/ids that use mod-inventory-storage service, even in a 1-user 30-min test run.
mod-inventory-storage used significant memory on a machine with 8GB RAM to return 50K to 100K records.
mod-inventory-storage was crashing a few times due to OutOfMemory exception during the test runs
The workflow that retrieves more than 100K records became unresponsive even with 1 user
The workflow with more than 5 users became unresponsive
fasterxml.jackson.databind.ObjectMapper.readValue method of mod-inventory-storage service consumed high CPU resources as there were a lot of JSON decoding, this implementation could be reviewed for optimization possibilities.
FOLIO performs better without being profiled when the tests are running

Test Runs

30-min Runs for export instance UUIDs workflow:

Test	Virtual Users	Duration	OKAPI log level	Profiled	Ramp up (total time in seconds)	Size of response (how many instances were returned)
1. FameFlower	1	30 min	INFO	No	105	50K~100K instances10K~50K instances
72. FameFlower	51	30 min	INFO	YesNo	501	10K~50K instances50K~100K instances
83. FameFlower	51	30 min	INFO	Yes	50	50K~100K instances

Results

*All numbers are in milliseconds except for those in the Delta % column, which indicates the difference in percentage going from 1 to 5 users, 10K~50K instances retrieved

1. High level FameFlower results data

1 and 5 users tests runs, 10K~50K instances

Image Removed

Fig 1.1: The chart shows the overall high-level API stats obtained by JMeter calling various APIs in the save instance UUIDs worfklow. It breaks down average response times for 1 and 5 users tests per API call.

Image Removed

Fig 1.2: The charts below offer a clearer side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

A few things to note:

GET_/inventory/instances and GET_/instance-bulk/ids have the slowest response time. Have failed responses even for 1 user
The workflow with more than 100K records became non-responsive even with 1 user
The workflow with more than 5 users became unresponsive at times during the test runs, more on that below.

...

Image Removed

Fig 1.3 A side-by-side comparison for the 1 and 5 users tests runs, 10K~50K instances

...

10	10K~50K instances
4. FameFlower	1	30 min	INFO	Yes	10	50K~100K instances
5. FameFlower	5	30 min	INFO	No	50	10K~50K instances
6. FameFlower	5	30 min	INFO	No	10	50K~100K instances
7. FameFlower	5	30 min	INFO	Yes	50	10K~50K instances
8. FameFlower	5	30 min	INFO	Yes	50	50K~100K instances

Results

*All numbers are in milliseconds except for those in the Delta % column, which indicates the difference in percentage going from 1 to 5 users, 10K~50K instances retrieved

1. High level FameFlower results data

1 and 5 users tests runs, 50K~100K 10K~50K instances

Image RemovedImage Added

Fig 1.4 1: The chart shows the overall high-level API stats obtained by JMeter calling various APIs in the save instance UUIDs worfklow. It breaks down average response times for 1 and 5 users tests per API call.

Image Added

Fig 1.2: The charts below offer a clearer side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

2. CPU Utilization

Image Removed

Fig 2.1 CPU utilization percentage chart for 1 and 5 users test runs retrieving ~10K-50K instances

These services for the selected modules were chosen for their activity in the workflow and prominent values compared to other modules.

The test run with 5 users and 10K~50K instances

Image Removed

Fig 2.2: CPU utilization of various modules running on an EC2 instance, including mod-inventory-storage

Based on this CPU usage chart (Fig 2.2) we can make see that the service consumed the most CPU resources was mod-inventory-storage.

3. Memory Usage

The following data shows memory consumption of ECS services, most notably of mod-inventory-storage during the 5 users and 10K-50K instances retrieved test run.

Image Removed

Fig 3.1: mod-inventory-storage service consumed 99% of allocated RAM memory in the 5 users and 10K-50K instances retrieved test run.

The diagram below shows mod-inventory-storage crashed a few times due to OOM. There were 4 instances of mod-inventory-storage active in this test run. This means that it crashed 3 times and spun up new mod-inventory-storage instances

Image Removed

Fig 3.2: In a 30 minutes test run mod-inventory-storage is shown

4. Disk IO

Image Removed

5. Database CPU Utilization

For 1 user - 30 min run
Image Removed

For 5 users - 30 min run

...

6. Database Slow queries

Slowest queries which took the most of execution time were initiated by the mod-inventory-storage service presented in the following table:

...

Percent of total time

...

Average Time,ms

...

Calls

...

Query

...

32%

...

10,796

...

15

...

A few things to note:

GET_/inventory/instances and GET_/instance-bulk/ids have the slowest response time. Have failed responses even for 1 user
The workflow with more than 100K records became non-responsive even with 1 user
The workflow with more than 5 users became unresponsive at times during the test runs, more on that below.

A side-by-side comparison for the 1 and 5 users tests runs, 10K~50K instances

Image Added

Fig 1.3 A side-by-side comparison for the 1 and 5 users tests runs, 10K~50K instances

A side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

Image Added

Fig 1.4 side-by-side comparison for the 1 and 5 users tests runs, 50K~100K instances

2. CPU Utilization

Image Added

Fig 2.1 CPU utilization percentage chart for 1 and 5 users test runs retrieving ~10K-50K instances

These services for the selected modules were chosen for their activity in the workflow and prominent values compared to other modules.

The test run with 5 users and 10K~50K instances

Image Added

Fig 2.2: CPU utilization of various modules running on an EC2 instance, including mod-inventory-storage

Based on this CPU usage chart (Fig 2.2) we can make see that the service consumed the most CPU resources was mod-inventory-storage.

3. Memory Usage

The following data shows memory consumption of ECS services, most notably of mod-inventory-storage during the 5 users and 10K-50K instances retrieved test run.

Image Added

Fig 3.1: mod-inventory-storage service consumed 99% of allocated RAM memory in the 5 users and 10K-50K instances retrieved test run.

The diagram below shows mod-inventory-storage crashed a few times due to OOM. There were 4 instances of mod-inventory-storage active in this test run. This means that it crashed 3 times and spun up new mod-inventory-storage instances

Image Added

Fig 3.2: In a 30 minutes test run mod-inventory-storage is shown

4. Disk IO

Image Added

5. Database CPU Utilization

For 1 user - 30 min run
Image Added

For 5 users - 30 min run

Image Added

6. Database Slow queries

Slowest queries which took the most of execution time were initiated by the mod-inventory-storage service presented in the following table:

Percent of total time	Average Time,ms	Calls	Query
32%	10,796	15	SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8))) ORDER BY left(lower(f_unaccent(instance.jsonb->>$9)),$10), lower(f_unaccent(instance.jsonb->>$11))
23%	22,250	5	SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8)))) AND (to_tsvector($9, f_unaccent(instance.jsonb->>$10)) @@ replace((to_tsquery($11, f_unaccent($12)))::text, $13, $14)::tsquery) ORDER BY left(lower(f_unaccent(instance.jsonb->>$15)),$16), lower(f_unaccent(instance.jsonb->>$17))
13%	1,709	37	SELECT COUNT(*) FROM (SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8))) ORDER BY left(lower(f_unaccent(instance.jsonb->>$9)),$10), lower(f_unaccent(instance.jsonb->>$11)) LIMIT $12) x
12%	1,818	34	WITH headrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>$1)) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($2, f_unaccent(concat_space_sql(instance.jsonb->>$3 , concat_array_object_values(instance.jsonb->$4,$5) , concat_array_object_values(instance.jsonb->$6,$7)))) @@ (to_tsquery($8, f_unaccent($9)))) AND left(lower(f_unaccent(jsonb->>$10)),$11) < ( SELECT left(lower(f_unaccent(jsonb->>$12)),$13) FROM fs09000000_mod_inventory_storage.instance ORDER BY left(lower(f_unaccent(jsonb->>'title')),600) OFFSET $14 LIMIT $15 ) ORDER BY left(lower(f_unaccent(jsonb->>$16)),$17) LIMIT $18 OFFSET $19 ), allrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>$20)) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($21, f_unaccent(concat_space_sql(instance.jsonb->>$22 , concat_array_object_values(instance.jsonb->$23,$24) , concat_array_object_values(instance.jsonb->$25,$26)))) @@ (to_tsquery($27, f_unaccent($28)))) AND (SELECT COUNT() FROM headrecords) < $29 ) SELECT jsonb, title, $30 AS count FROM headrecords WHERE (SELECT COUNT() FROM headrecords) >= $31 UNION (SELECT jsonb, title, (SELECT COUNT(*) FROM allrecords) AS count FROM allrecords ORDER BY title LIMIT $32 OFFSET $33 ) ORDER BY title
4%	2,804	7	SELECT COUNT(*) FROM (SELECT jsonb,id FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector($1, f_unaccent(concat_space_sql(instance.jsonb->>$2 , concat_array_object_values(instance.jsonb->$3,$4) , concat_array_object_values(instance.jsonb->$5,$6)))) @@ (to_tsquery($7, f_unaccent($8)))) AND (to_tsvector($9, f_unaccent(instance.jsonb->>$10)) @@ replace((to_tsquery($11, f_unaccent($12)))::text, $13, $14)::tsquery) ORDER BY left(lower(f_unaccent(instance.jsonb->>$15)),$16), lower(f_unaccent(instance.jsonb->>$17)) LIMIT $18) x
3%	1,865	9	EXPLAIN ANALYZE WITH headrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>'title')) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector('simple', f_unaccent(concat_space_sql(instance.jsonb->>'title' , concat_array_object_values(instance.jsonb->'contributors','name') , concat_array_object_values(instance.jsonb->'identifiers','value')))) @@ (to_tsquery('simple', f_unaccent('''english''')))) AND left(lower(f_unaccent(jsonb->>'title')),600) < ( SELECT left(lower(f_unaccent(jsonb->>'title')),600) FROM fs09000000_mod_inventory_storage.instance ORDER BY left(lower(f_unaccent(jsonb->>'title')),600) OFFSET 10000 LIMIT 1 ) ORDER BY left(lower(f_unaccent(jsonb->>'title')),600) LIMIT 100 OFFSET 0 ), allrecords AS ( SELECT jsonb, lower(f_unaccent(jsonb->>'title')) AS title FROM fs09000000_mod_inventory_storage.instance WHERE (to_tsvector('simple', f_unaccent(concat_space_sql(instance.jsonb->>'title' , concat_array_object_values(instance.jsonb->'contributors','name') , concat_array_object_values(instance.jsonb->'identifiers','value')))) @@ (to_tsquery('simple', f_unaccent('''english''')))) AND (SELECT COUNT() FROM headrecords) < 100 ) SELECT jsonb, title, 0 AS count FROM headrecords WHERE (SELECT COUNT() FROM headrecords) >= 100 UNION (SELECT jsonb, title, (SELECT COUNT(*) FROM allrecords) AS count FROM allrecords ORDER BY title LIMIT 100 OFFSET 0 ) ORDER BY title

7. Database Missing indexes

8. JVM Profiling result

JVM profiling of the most resources consuming mod-inventory-storage service showed 6 methods which had a high CPU usage and impact on the overall service performance.

Image Removed

fasterxml.jackson.databind.ObjectMapper.readValue method uses most of CPU capacity which leads to performance degradation

Image Removed

Summary

...

The workflow that retrieves more than 100K records became unresponsive even with 1 user

...

. JVM Profiling result

JVM profiling of the most resources consuming mod-inventory-storage service showed 6 methods which had a high CPU usage and impact on the overall service performance.

Image Added

fasterxml.jackson.databind.ObjectMapper.readValue method

...

uses most of CPU capacity which leads to performance degradation

Image Added

Appendix

See Attached FameFlower Performance Test Runs.xlsx for details

Versions Compared

Old Version 38

New Version 39

Key

Test Runs

High Level Summary

Test Runs

Results

1. High level FameFlower results data

Results

1. High level FameFlower results data

2. CPU Utilization

3. Memory Usage

4. Disk IO

5. Database CPU Utilization

6. Database Slow queries

2. CPU Utilization

3. Memory Usage

4. Disk IO

5. Database CPU Utilization

6. Database Slow queries

7. Database Missing indexes

8. JVM Profiling result

Summary

. JVM Profiling result

Appendix

Page Comparison

Versions Compared

Old Version 38

New Version 39

Key

Test Runs

High Level Summary

Test Runs

Results

1. High level FameFlower results data

Results

1. High level FameFlower results data

2. CPU Utilization

3. Memory Usage

4. Disk IO

5. Database CPU Utilization

6. Database Slow queries

2. CPU Utilization

3. Memory Usage

4. Disk IO

5. Database CPU Utilization

6. Database Slow queries

7. Database Missing indexes

8. JVM Profiling result

Summary

. JVM Profiling result

Appendix