Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

Table of Contents
Overview

This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

Ticket: 

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-756

Summary

There is significant performance improvement for data import in Poppy with file splitting feature compared with Orchid but small degradation compared to Poppy without file splitting feature. CI/CO response times almost the same compared to Poppy without file splitting feature.

Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODINV-944
).Spikes of  mod-data-import observed in Data Import jobs with 50k files up to 130%. for jobs  and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%

There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.

Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.

Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections

Test Runs 

...

Test #

...

Scenario

...

Test Results

Data import

Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.

...

Poppy with file splitting feature (hh:mm:ss)

...

Check In, Check Out Response time (8 users)

Poppy

...

1.111

...

Check-in/Check-out without DI

...

Comparison

CICO with DI comparison

...

1.111

...

Table of Contents

Overview

This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.

Ticket: 

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-756


Summary

  • There is significant improvement in data import performance in Poppy using the file splitting feature compared to Orchid (40% for DI Create, 25% for DI Update). However, there is a small degradation (up to 5%) compared to Poppy without the file splitting feature when running with CICO. CO response times are almost identical to Poppy without the file splitting feature. The CI response time is 20% slower with and without Data Import.
  • Average CPU utilization did not exceed 150% for all the modules. The highest consumption was observed from mod-inventory. It was growing from 110% up to 250% at the end of the test (So as memory grows too, we can suspect the issue
    Jira Legacy
    serverSystem Jira
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMODINV-944
    . It is fixed in version 20.1.9 but this test was run on version 20.1.7 of mod-inventory). 
    Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and a 250% spike for 100k. For Data Import jobs CPU utilization didn't exceed 110% for all other modules
  • Memory utilization increase is a result of previous modules restarting (everyday cluster shutdown process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During the test with 100k files mod-search memory utilization increases to 90% and mod-inventory up to 100%.
  • Average DB CPU usage during data import is about 95% which is consistent with the performance observed during the same tests in Orchid.
  • The average connection count during data import is approximately 600 connections for Create jobs, which is twice as high as when the file splitting feature is disabled. For Update jobs, the connection count is 560.

Test Runs 

Test #

Scenario

Load level
1DI MARC Bib Create5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause)
CICO 8 users
2DI MARC Bib Update5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause)
CICO 8 users

Test Results

Data import

Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.


Profile


MARC File


DI Duration

Poppy with file splitting feature (hh:mm:ss)

Check In, Check Out
Response time (8 users)

Poppy

CI Average, secCO Average, sec



DI MARC Bib Create (PTF - Create 2)

5K.mrc00:02:47

1.111

1.432
10K.mrc00:05:261.2611.556
25K.mrc00:14:311.4411.532
50K.mrc00:24:131.4321.478
100K.mrc00:49:351.3581.621



DI MARC Bib Update (PTF - Updates Success - 1)

5K.mrc00:03:390.8701.201
10K.mrc00:06:460.8851.216
25K.mrc00:17:040.9491.266
50K.mrc00:34:231.0831.264
100K.mrc01:14:301.0241.383

Check-in/Check-out without DI


Scenario


Load level


Request

Response time, sec
Poppy with file splitting feature
95 percaverage
Circulation Check-in/Check-out (without Data Import)8 usersCheck-in0.7240.610
Check-out0.9990.872

Comparison

CICO with DI comparison

ProfileMARC FileDI DurationDeviation, %Check In, Check Out Response time (8 users)Check In, Check Out Response time (8 users)Delta, %

without CI/COwith CI/COPoppy with file splitting featureOrchidPoppyPoppy with file splitting featurePoppy/Poppy with file splitting featurePoppy/Poppy with file splitting feature
Orchid*PoppyPoppy with file splitting featureOrchid*PoppyPoppy with file splitting feature compared DI without CICO and with CICODi with CICO compared to without splitting featureCI Average secCO Average secCI Average secCO Average secCI Average secCO Average secCICO
DI MARC Bib Create (PTF - Create 2)5K.mrc00:04:3000:02:3900:02:2600:05:0100:02:5300:02:47+ 00:00:21 - 00:00:060.9611.4420.9011.375

1.111

1.43218.90%3.98%
10K.mrc00:09:2500:05:0000:04:5600:09:0600:04:3200:05:26+ 00:00:30+ 00:00:461.0581.6240.9021.471.2611.55628.47%5.53%
25K.mrc00:22:1600:11:1500:12:1400:24:2800:11:1400:14:31+ 00:02:16+ 00:03:171.0561.62111.5711.4411.53230.60%-2.55%
50K.mrc00:39:2700:22:1600:22:4900:43:0300:21:5500:24:13+ 00:01:24+ 00:02:180.9361.5190.9811.461.4321.47831.49%1.22%
100K.mrc01:38:0000:49:5800:47:5201:35:5000:47:0200:49:35+ 00:01:47+ 00:02:330.8681.4681.0181.4911.3581.62125.04%8.02%
DI MARC Bib Update (PTF - Updates Success - 1)5K.mrc00:04:0200:02:2800:03:1700:04:5200:03:1900:03:39+ 00:00:22+ 00:00:200.8551.3390.7551.1690.8701.20113.22%2.66%
10K.mrc00:08:1000:05:3100:06:3200:09:2200:06:2000:06:46+ 00:00:14+ 00:00:260.9161.3980.751.3070.8851.21615.25%-7.48%
25K.mrc00:19:3900:14:5000:16:0500:24:0200:14:0400:17:04+ 00:00:59+ 00:03:000.9221.4250.8221.4030.9491.26613.38%-10.82%
50K.mrc00:38:3000:32:5300:32:4300:47:1300:29:5900:34:23+ 00:01:40+ 00:04:240.9041.4560.8931.4241.0831.26417.54%-12.66%
100K.mrc01:33:0001:14:3901:10:0401:40:2501:03:0301:14:30+ 00:04:26+ 00:11:270.8381.4150.9081.511.0241.383

The following table compares test results of current release (Orchid) to the previous release numbers (Orchid) and to the baselines Poppy results (CICO without DI and DI without CICO).

...

11.33%-9.18%

* Orchid and Poppy DI and CICO results are taken from Data Import with Check-ins Check-outs Orchid.*** Completed with errors(Poppy).


Detailed CICO response time comparison


Scenario


Load level


Request

Response time, sec
Orchid
Response time, sec
Poppy
Response time, sec
Poppy with file splitting feature
95 percaverage95 percaverage95 percaverage
Circulation Check-in/Check-out (without Data import)8 usersCheck-in0.4890.3940.4890.4310.7240.610
Check-out0.7930.7240.9690.8280.9990.872

...

DI MARC BIB Create + CICO

Image Removed

DI Bib Update + CICO

Image Removed

Service CPU Utilization

Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODINV-944
)

Spikes of  mod-data-import observed in Data Import jobs with 50k files up to 130%. for jobs  and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%

Image Removed

Service Memory Utilization

There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).

Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.

Image Removed

...

Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.

Image Removed

DB Connections

Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections

Image Removed

DB load

Top SQL-queries:

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *

INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

MARC BIB Update + CICO

Top SQL-queries:

INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)

INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)

Appendix

Infrastructure

PTF -environment pcp1

...

2 database  instances, writer/reader

...

db.r6g.xlarge

...

  • 4 m5.2xlarge brokers in 2 zones
  • Apache Kafka version 2.8.0

  • EBS storage volume per broker 300 GiB

  • auto.create.topics.enable=true
  • log.retention.minutes=480
  • default.replication.factor=3

...

Image Added

DI Bib Update + CICO

Image Added

Service CPU Utilization

Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue

Jira Legacy
serverSystem Jira
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyMODINV-944
)

Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%


Image Added

Service Memory Utilization

There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).

Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.

Image Added



DB CPU Utilization

Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.

Image Added



DB Connections

Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections

Image Added

DB load

Image Added

Appendix

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database  instance, writer


    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731


  • MSK cluster - tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3


ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
pcp1-pvt
mod-remote-storage10(11)*3.0.024920447210243960512512FALSE
mod-data-import18(20)*3.0.71204818442561292384512FALSE
mod-source-record-storage15(18authtoken13(16)*52.7.3(5.7.5)*14.1256001440500011522048512350092238488512128FALSE
mod-inventoryconfiguration119(14)*20.1.3(20.1.7)*10)*5.9.222880102425928961024128181476838488512128FALSE
mod-diusers-converter-storagebl159(1810)*27.1.2(2.1.5)*6.0210241440896115212851276892288128FALSE
mod-circulationinventory-storage12(1415)*2427.0.83(2427.0.114)*228804096259236901536204818143076384512FALSE
mod-circulation-pubsubstorage1112(1314)*217.111.23(217.111.37)*21536288014402592102415369221814384512FALSE
mod-source-patron-blocks9(10)*1.9.021024896102476888128record-storage15(18)*5.7.3(5.7.5)*25600500020483500384512FALSE
mod-source-record-manager14(17inventory11(14)*320.71.43(320.1.7.8)*256002880500025922048102435001814384512FALSE
mod-di-quickconverter-marcstorage915(1118)*52.01.02(52.01.15)*12228810242176896128166476838488512128FALSE
nginxmod-okapi92023.06.1421024896128000FALSEokapi-b115.1.23168414401024922circulation12(14)*24.0.8(24.0.11)*22880259215361814384512FALSE
mod-feesfinespubsub1011(1113)*19.0.02.11.2(2.11.3)*2102415368961440128102476892288384128512FALSEpub-okapi
mod-patron-blocks9(10)*20231.069.1402102489612876800FALSE

 * - The newest version was used in this test to compare with previous test

Methodology/Approach

DI tests were started from UI with 5-minute pauses between the tests.

Additional links

Grafana dashboard:

MARC Bib Create + CICO

http://carrier-io.int.folio.ebsco.com/grafana/d/SqzWB26nk/jmeter-performance-check-in-check-out?orgId=1&from=1700738030629&to=1700749313428&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_Poppy_3&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All

MARC Bib Update + CICO

...

102476888128FALSE
mod-source-record-manager14(17)*3.7.4(3.7.8)*25600500020483500384512FALSE
mod-quick-marc9(11)*5.0.0(5.0.1)*1228821761281664384512FALSE
nginx-okapi92023.06.1421024896128000FALSE
okapi-b115.1.23168414401024922384512FALSE
mod-feesfines10(11)*19.0.02102489612876888128FALSE
pub-okapi92023.06.142102489612876800FALSE

 * - The newest version was used in this test to compare with the previous test

Methodology/Approach

DI tests were started from UI with 5-minute pauses between the tests.

Additional links

Grafana dashboard:

MARC Bib Create + CICO

MARC Bib Update + CICO