...

Table of Contents
Overview

This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

Ticket:

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-756

Summary

There is significant performance improvement for data import in Poppy with file splitting feature compared with Orchid but small degradation compared to Poppy without file splitting feature. CI/CO response times almost the same compared to Poppy without file splitting feature.

Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODINV-944

).Spikes of mod-data-import observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%

There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.

Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.

Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections

Test Runs

...

Test #

...

Scenario

...

Test Results

Data import

Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.

...

Poppy with file splitting feature (hh:mm:ss)

...

Check In, Check Out Response time (8 users)

Poppy

...

1.111

...

Check-in/Check-out without DI

...

Comparison

CICO with DI comparison

...

1.111

...

Overview

This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

Ticket:

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	PERF-756

Summary

There is significant improvement in data import performance in Poppy using the file splitting feature compared to Orchid (40% for DI Create, 25% for DI Update). However, there is a small degradation (up to 5%) compared to Poppy without the file splitting feature when running with CICO. CO response times are almost identical to Poppy without the file splitting feature. The CI response time is 20% slower with and without Data Import.
Average CPU utilization did not exceed 150% for all the modules. The highest consumption was observed from mod-inventory. It was growing from 110% up to 250% at the end of the test (So as memory grows too, we can suspect the issue
Jira Legacy
server System Jira
serverId 01505d01-b853-3c2e-90f1-ee9b165564fc
key MODINV-944
. It is fixed in version 20.1.9 but this test was run on version 20.1.7 of mod-inventory). Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and a 250% spike for 100k. For Data Import jobs CPU utilization didn't exceed 110% for all other modules
Memory utilization increase is a result of previous modules restarting (everyday cluster shutdown process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During the test with 100k files mod-search memory utilization increases to 90% and mod-inventory up to 100%.
Average DB CPU usage during data import is about 95% which is consistent with the performance observed during the same tests in Orchid.
The average connection count during data import is approximately 600 connections for Create jobs, which is twice as high as when the file splitting feature is disabled. For Update jobs, the connection count is 560.

Test Runs

Test #	Scenario	Load level
1	DI MARC Bib Create	5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause)
1	CICO	8 users
2	DI MARC Bib Update	5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause)
2	CICO	8 users

Test Results

Data import

Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.

Profile	MARC File	DI Duration Poppy with file splitting feature (hh:mm:ss)	Check In, Check Out Response time (8 users) Poppy
Profile	MARC File	DI Duration Poppy with file splitting feature (hh:mm:ss)	CI Average, sec	CO Average, sec
DI MARC Bib Create (PTF - Create 2)	5K.mrc	00:02:47	1.111	1.432
	10K.mrc	00:05:26	1.261	1.556
	25K.mrc	00:14:31	1.441	1.532
	50K.mrc	00:24:13	1.432	1.478
	100K.mrc	00:49:35	1.358	1.621
DI MARC Bib Update (PTF - Updates Success - 1)	5K.mrc	00:03:39	0.870	1.201
	10K.mrc	00:06:46	0.885	1.216
	25K.mrc	00:17:04	0.949	1.266
	50K.mrc	00:34:23	1.083	1.264
	100K.mrc	01:14:30	1.024	1.383

Check-in/Check-out without DI

Scenario

Load level

Request

Response time, sec
Poppy with file splitting feature

95 perc

average

Circulation Check-in/Check-out (without Data Import)

8 users

Check-in

0.724

0.610

Check-out

0.999

0.872

Comparison

CICO with DI comparison

Profile

MARC File

DI Duration

Deviation, %

Check In, Check Out Response time (8 users)

Delta, %

without CI/CO

with CI/CO

Poppy with file splitting feature

Orchid

Poppy

Poppy with file splitting feature

Poppy/Poppy with file splitting feature

Orchid*

Poppy

Poppy with file splitting feature

Orchid*

Poppy

Poppy with file splitting feature

compared DI without CICO and with CICO

Di with CICO compared to without splitting feature

CI Average sec

CO Average sec

CI Average sec

CO Average sec

CI Average sec

CO Average sec

CI

CO

DI MARC Bib Create (PTF - Create 2)

5K.mrc

00:04:30

00:02:39

00:02:26

00:05:01

00:02:53

00:02:47

+ 00:00:21

- 00:00:06

0.961

1.442

0.901

1.375

1.111

1.432

18.90%

3.98%

10K.mrc

00:09:25

00:05:00

00:04:56

00:09:06

00:04:32

00:05:26

+ 00:00:30

+ 00:00:46

1.058

1.624

0.902

1.47

1.261

1.556

28.47%

5.53%

25K.mrc

00:22:16

00:11:15

00:12:14

00:24:28

00:11:14

00:14:31

+ 00:02:16

+ 00:03:17

1.056

1.621

1

1.571

1.441

1.532

30.60%

-2.55%

50K.mrc

00:39:27

00:22:16

00:22:49

00:43:03

00:21:55

00:24:13

+ 00:01:24

+ 00:02:18

0.936

1.519

0.981

1.46

1.432

1.478

31.49%

1.22%

100K.mrc

01:38:00

00:49:58

00:47:52

01:35:50

00:47:02

00:49:35

+ 00:01:47

+ 00:02:33

0.868

1.468

1.018

1.491

1.358

1.621

25.04%

8.02%

DI MARC Bib Update (PTF - Updates Success - 1)

5K.mrc

00:04:02

00:02:28

00:03:17

00:04:52

00:03:19

00:03:39

+ 00:00:22

+ 00:00:20

0.855

1.339

0.755

1.169

0.870

1.201

13.22%

2.66%

10K.mrc

00:08:10

00:05:31

00:06:32

00:09:22

00:06:20

00:06:46

+ 00:00:14

+ 00:00:26

0.916

1.398

0.75

1.307

0.885

1.216

15.25%

-7.48%

25K.mrc

00:19:39

00:14:50

00:16:05

00:24:02

00:14:04

00:17:04

+ 00:00:59

+ 00:03:00

0.922

1.425

0.822

1.403

0.949

1.266

13.38%

-10.82%

50K.mrc

00:38:30

00:32:53

00:32:43

00:47:13

00:29:59

00:34:23

+ 00:01:40

+ 00:04:24

0.904

1.456

0.893

1.424

1.083

1.264

17.54%

-12.66%

100K.mrc

01:33:00

01:14:39

01:10:04

01:40:25

01:03:03

01:14:30

+ 00:04:26

+ 00:11:27

0.838

1.415

0.908

1.51

1.024

1.383

The following table compares test results of current release (Orchid) to the previous release numbers (Orchid) and to the baselines Poppy results (CICO without DI and DI without CICO).

...

11.33%

-9.18%

* Orchid and Poppy DI and CICO results are taken from Data Import with Check-ins Check-outs Orchid.*** Completed with errors(Poppy).

Detailed CICO response time comparison

Scenario

Load level

Request

Response time, sec
Orchid

Response time, sec
Poppy

Response time, sec
Poppy with file splitting feature

95 perc

average

95 perc

average

95 perc

average

Circulation Check-in/Check-out (without Data import)

8 users

Check-in

0.489

0.394

0.489

0.431

0.724

0.610

Check-out

0.793

0.724

0.969

0.828

0.999

0.872

...

DI MARC BIB Create + CICO

Image Removed

DI Bib Update + CICO

Image Removed

Service CPU Utilization

Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODINV-944

)

Spikes of mod-data-import observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%

Image Removed

Service Memory Utilization

There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).

Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.

Image Removed

...

Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.

Image Removed

DB Connections

Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections

Image Removed

DB load

Top SQL-queries:

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *

INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

MARC BIB Update + CICO

Top SQL-queries:

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)

Appendix

Infrastructure

PTF -environment pcp1

...

2 database instances, writer/reader

...

db.r6g.xlarge

...

4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
auto.create.topics.enable=true
log.retention.minutes=480
default.replication.factor=3

...

Image Added

DI Bib Update + CICO

Image Added

Service CPU Utilization

Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue

Jira Legacy

server	System Jira
serverId	01505d01-b853-3c2e-90f1-ee9b165564fc
key	MODINV-944

)

Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%

Image Added

Service Memory Utilization

There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).

Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.

Image Added

DB CPU Utilization

Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.

Image Added

DB Connections

Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections

Image Added

DB load

Image Added

Appendix

Infrastructure

PTF -environment pcp1

10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer

Name Memory GIB vCPUs max_connections
db.r6g.xlarge
32 GiB 4 vCPUs 2731
MSK cluster - tenant
- 4 m5.2xlarge brokers in 2 zones
- Apache Kafka version 2.8.0
- EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3

Module	Task Def. Revision	Module Version	Task Count	Mem Hard Limit	Mem Soft limit	CPU units	Xmx	MetaspaceSize	MaxMetaspaceSize	R/W split enabled
pcp1-pvt	Task Def. Revision	Module Version	Task Count	Mem Hard Limit	Mem Soft limit	CPU units	Xmx	MetaspaceSize	MaxMetaspaceSize	R/W split enabled
mod-remote-storage	10(11)*	3.0.0	2	4920	4472	1024	3960	512	512	FALSE
mod-data-import	18(20)*	3.0.7	1	2048	1844	256	1292	384	512	FALSE
mod-source-record-storage15(18authtoken	13(16)*5	2.7.3(5.7.5)*14.1	2	56001440	50001152	2048512	3500922	38488	512128	FALSE
mod-inventoryconfiguration	119(14)20.1.3(20.1.7)10)*	5.9.2	2	28801024	2592896	1024128	1814768	38488	512128	FALSE
mod-diusers-converter-storagebl	159(1810)*2	7.1.2(2.1.5)*6.0	2	10241440	8961152	128512	768922	88	128	FALSE
mod-circulationinventory-storage	12(1415)*	2427.0.83(2427.0.114)*	2	28804096	25923690	15362048	18143076	384	512	FALSE
mod-circulation-pubsubstorage	1112(1314)*	217.111.23(217.111.37)*	2	15362880	14402592	10241536	9221814	384	512	FALSE
mod-source-patron-blocks	9(10)*	1.9.0	2	1024	896	1024	768	88	128record-storage	15(18)*	5.7.3(5.7.5)*	2	5600	5000	2048	3500	384	512	FALSE
mod-source-record-manager14(17inventory	11(14)*	320.71.43(320.1.7.8)*	2	56002880	50002592	20481024	35001814	384	512	FALSE
mod-di-quickconverter-marcstorage	915(1118)*	52.01.02(52.01.15)*1	2	22881024	2176896	128	1664768	38488	512128	FALSE
nginxmod-okapi	9	2023.06.14	2	1024	896	128	0	0	0	FALSE	okapi-b	11	5.1.2	3	1684	1440	1024	922	circulation	12(14)*	24.0.8(24.0.11)*	2	2880	2592	1536	1814	384	512	FALSE
mod-feesfinespubsub	1011(1113)*	19.0.02.11.2(2.11.3)*	2	10241536	8961440	1281024	768922	88384	128512	FALSEpub-okapi
mod-patron-blocks	9(10)*	20231.069.140	2	1024	896	128	768	0	0	FALSE

* - The newest version was used in this test to compare with previous test

Methodology/Approach

DI tests were started from UI with 5-minute pauses between the tests.

Additional links

Grafana dashboard:

MARC Bib Create + CICO

http://carrier-io.int.folio.ebsco.com/grafana/d/SqzWB26nk/jmeter-performance-check-in-check-out?orgId=1&from=1700738030629&to=1700749313428&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_Poppy_3&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All

MARC Bib Update + CICO

...

	1024	768	88	128	FALSE
mod-source-record-manager	14(17)*	3.7.4(3.7.8)*	2	5600	5000	2048	3500	384	512	FALSE
mod-quick-marc	9(11)*	5.0.0(5.0.1)*	1	2288	2176	128	1664	384	512	FALSE
nginx-okapi	9	2023.06.14	2	1024	896	128	0	0	0	FALSE
okapi-b	11	5.1.2	3	1684	1440	1024	922	384	512	FALSE
mod-feesfines	10(11)*	19.0.0	2	1024	896	128	768	88	128	FALSE
pub-okapi	9	2023.06.14	2	1024	896	128	768	0	0	FALSE

* - The newest version was used in this test to compare with the previous test

Methodology/Approach

DI tests were started from UI with 5-minute pauses between the tests.

Additional links

Grafana dashboard:

MARC Bib Create + CICO

MARC Bib Update + CICO

Version	Old Version 7	New Version Current
Changes made by	Olga Kondratenko	Olga Kondratenko
Saved on	Mar 19, 2024	Apr 11, 2024

Versions Compared

Key

Table of Contents
Overview

Summary

Test Runs

Test Results

Comparison

Table of Contents

Overview

Summary

Test Runs

Test Results