...
...
...
...
...
...
...
...
...
...
...
...
Table of Contents |
---|
This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.
Ticket:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Summary
There is significant performance improvement for data import in Poppy with file splitting feature compared with Orchid but small degradation compared to Poppy without file splitting feature. CI/CO response times almost the same compared to Poppy without file splitting feature.
Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue
).Spikes of mod-data-import observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110% Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODINV-944
There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.
Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.
Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections
Test Runs
...
Test #
...
Scenario
...
Test Results
Data import
Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.
...
Poppy with file splitting feature (hh:mm:ss)
...
Check In, Check Out Response time (8 users)
Poppy
...
1.111
...
Check-in/Check-out without DI
...
Comparison
CICO with DI comparison
...
1.111
...
Table of Contents |
---|
Overview
This document contains the results of testing Check-in/Check-out and Data Import with file splitting feature for MARC Bibliographic records in the Poppy release.
Ticket:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Summary
- There is significant improvement in data import performance in Poppy using the file splitting feature compared to Orchid (40% for DI Create, 25% for DI Update). However, there is a small degradation (up to 5%) compared to Poppy without the file splitting feature when running with CICO. CO response times are almost identical to Poppy without the file splitting feature. The CI response time is 20% slower with and without Data Import.
- Average CPU utilization did not exceed 150% for all the modules. The highest consumption was observed from mod-inventory. It was growing from 110% up to 250% at the end of the test (So as memory grows too, we can suspect the issue
. It is fixed in version 20.1.9 but this test was run on version 20.1.7 of mod-inventory). Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and a 250% spike for 100k. For Data Import jobs CPU utilization didn't exceed 110% for all other modulesJira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODINV-944 - Memory utilization increase is a result of previous modules restarting (everyday cluster shutdown process). Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During the test with 100k files mod-search memory utilization increases to 90% and mod-inventory up to 100%.
- Average DB CPU usage during data import is about 95% which is consistent with the performance observed during the same tests in Orchid.
- The average connection count during data import is approximately 600 connections for Create jobs, which is twice as high as when the file splitting feature is disabled. For Update jobs, the connection count is 560.
Test Runs
Test # | Scenario | Load level |
---|---|---|
1 | DI MARC Bib Create | 5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause) |
CICO | 8 users | |
2 | DI MARC Bib Update | 5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause) |
CICO | 8 users |
Test Results
Data import
Total time for all Data Export jobs - 1 hour 16 minutes 47 seconds.
Profile | MARC File |
Poppy with file splitting feature (hh:mm:ss) | Check In, Check Out Poppy | |
---|---|---|---|---|
CI Average, sec | CO Average, sec | |||
DI MARC Bib Create (PTF - Create 2) | 5K.mrc | 00:02:47 | 1.111 | 1.432 |
10K.mrc | 00:05:26 | 1.261 | 1.556 | |
25K.mrc | 00:14:31 | 1.441 | 1.532 | |
50K.mrc | 00:24:13 | 1.432 | 1.478 | |
100K.mrc | 00:49:35 | 1.358 | 1.621 | |
DI MARC Bib Update (PTF - Updates Success - 1) | 5K.mrc | 00:03:39 | 0.870 | 1.201 |
10K.mrc | 00:06:46 | 0.885 | 1.216 | |
25K.mrc | 00:17:04 | 0.949 | 1.266 | |
50K.mrc | 00:34:23 | 1.083 | 1.264 | |
100K.mrc | 01:14:30 | 1.024 | 1.383 |
Check-in/Check-out without DI
Scenario | Load level | Request | Response time, sec Poppy with file splitting feature | |
---|---|---|---|---|
95 perc | average | |||
Circulation Check-in/Check-out (without Data Import) | 8 users | Check-in | 0.724 | 0.610 |
Check-out | 0.999 | 0.872 |
Comparison
CICO with DI comparison
Profile | MARC File | DI Duration | Deviation, % | Check In, Check Out Response time (8 users) | Check In, Check Out Response time (8 users) | Delta, % | |||||||||||
without CI/CO | with CI/CO | Poppy with file splitting feature | Orchid | Poppy | Poppy with file splitting feature | Poppy/Poppy with file splitting feature | Poppy/Poppy with file splitting feature | ||||||||||
Orchid* | Poppy | Poppy with file splitting feature | Orchid* | Poppy | Poppy with file splitting feature | compared DI without CICO and with CICO | Di with CICO compared to without splitting feature | CI Average sec | CO Average sec | CI Average sec | CO Average sec | CI Average sec | CO Average sec | CI | CO | ||
DI MARC Bib Create (PTF - Create 2) | 5K.mrc | 00:04:30 | 00:02:39 | 00:02:26 | 00:05:01 | 00:02:53 | 00:02:47 | + 00:00:21 | - 00:00:06 | 0.961 | 1.442 | 0.901 | 1.375 | 1.111 | 1.432 | 18.90% | 3.98% |
10K.mrc | 00:09:25 | 00:05:00 | 00:04:56 | 00:09:06 | 00:04:32 | 00:05:26 | + 00:00:30 | + 00:00:46 | 1.058 | 1.624 | 0.902 | 1.47 | 1.261 | 1.556 | 28.47% | 5.53% | |
25K.mrc | 00:22:16 | 00:11:15 | 00:12:14 | 00:24:28 | 00:11:14 | 00:14:31 | + 00:02:16 | + 00:03:17 | 1.056 | 1.621 | 1 | 1.571 | 1.441 | 1.532 | 30.60% | -2.55% | |
50K.mrc | 00:39:27 | 00:22:16 | 00:22:49 | 00:43:03 | 00:21:55 | 00:24:13 | + 00:01:24 | + 00:02:18 | 0.936 | 1.519 | 0.981 | 1.46 | 1.432 | 1.478 | 31.49% | 1.22% | |
100K.mrc | 01:38:00 | 00:49:58 | 00:47:52 | 01:35:50 | 00:47:02 | 00:49:35 | + 00:01:47 | + 00:02:33 | 0.868 | 1.468 | 1.018 | 1.491 | 1.358 | 1.621 | 25.04% | 8.02% | |
DI MARC Bib Update (PTF - Updates Success - 1) | 5K.mrc | 00:04:02 | 00:02:28 | 00:03:17 | 00:04:52 | 00:03:19 | 00:03:39 | + 00:00:22 | + 00:00:20 | 0.855 | 1.339 | 0.755 | 1.169 | 0.870 | 1.201 | 13.22% | 2.66% |
10K.mrc | 00:08:10 | 00:05:31 | 00:06:32 | 00:09:22 | 00:06:20 | 00:06:46 | + 00:00:14 | + 00:00:26 | 0.916 | 1.398 | 0.75 | 1.307 | 0.885 | 1.216 | 15.25% | -7.48% | |
25K.mrc | 00:19:39 | 00:14:50 | 00:16:05 | 00:24:02 | 00:14:04 | 00:17:04 | + 00:00:59 | + 00:03:00 | 0.922 | 1.425 | 0.822 | 1.403 | 0.949 | 1.266 | 13.38% | -10.82% | |
50K.mrc | 00:38:30 | 00:32:53 | 00:32:43 | 00:47:13 | 00:29:59 | 00:34:23 | + 00:01:40 | + 00:04:24 | 0.904 | 1.456 | 0.893 | 1.424 | 1.083 | 1.264 | 17.54% | -12.66% | |
100K.mrc | 01:33:00 | 01:14:39 | 01:10:04 | 01:40:25 | 01:03:03 | 01:14:30 | + 00:04:26 | + 00:11:27 | 0.838 | 1.415 | 0.908 | 1.51 | 1.024 | 1.383 |
The following table compares test results of current release (Orchid) to the previous release numbers (Orchid) and to the baselines Poppy results (CICO without DI and DI without CICO).
...
11.33% | -9.18% |
* Orchid and Poppy DI and CICO results are taken from Data Import with Check-ins Check-outs Orchid.*** Completed with errors(Poppy).
Detailed CICO response time comparison
Scenario | Load level | Request | Response time, sec Orchid | Response time, sec Poppy | Response time, sec Poppy with file splitting feature | |||
---|---|---|---|---|---|---|---|---|
95 perc | average | 95 perc | average | 95 perc | average | |||
Circulation Check-in/Check-out (without Data import) | 8 users | Check-in | 0.489 | 0.394 | 0.489 | 0.431 | 0.724 | 0.610 |
Check-out | 0.793 | 0.724 | 0.969 | 0.828 | 0.999 | 0.872 |
...
DI MARC BIB Create + CICO
DI Bib Update + CICO
Service CPU Utilization
Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue
) Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODINV-944
Spikes of mod-data-import observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%
Service Memory Utilization
There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).
Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.
...
Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.
DB Connections
Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections
DB load
Top SQL-queries:
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
UPDATE fs09000000_mod_source_record_manager.job_execution_progress SET succeeded_records_count = succeeded_records_count + $2, error_records_count = error_records_count + $3 WHERE job_execution_id = $1 Returning *
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
MARC BIB Update + CICO
Top SQL-queries:
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id, permanent_location_id, tenant_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
Appendix
Infrastructure
PTF -environment pcp1
...
2 database instances, writer/reader
...
db.r6g.xlarge
...
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
...
DI Bib Update + CICO
Service CPU Utilization
Average CPU utilization did not exceed 150% for all the modules. The highest consumption observed from mod-inventory It was growing from 110% up to 250% at the end of the test (So as memory grows too we can suspect issue
) Jira Legacy server System Jira serverId 01505d01-b853-3c2e-90f1-ee9b165564fc key MODINV-944
Spikes of mod-data-import were observed in Data Import jobs with 50k files up to 130%. for jobs and 250% spike for 100k. For Data Import jobs with 5k, 10k, 25k files CPU utilization didn't exceed 110%
Service Memory Utilization
There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).
Memory consumption before tests for mod-search was 45% and for mod-inventory - 55%. During test with 100k file mod-search grew up to 90% and mod-inventory up to 100%.
DB CPU Utilization
Average DB CPU usage during data import is about 95% The same results if to compare with the same tests in Orchid.
DB Connections
Average connection count during data import is about 600 connections for create jobs that is two times higher than without file splitting feature. For update jobs - 560 connections
DB load
Appendix
Infrastructure
PTF -environment pcp1
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
1 database instance, writer
Name Memory GIB vCPUs max_connections db.r6g.xlarge
32 GiB 4 vCPUs 2731 - MSK cluster - tenant
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
Module | Task Def. Revision | Module Version | Task Count | Mem Hard Limit | Mem Soft limit | CPU units | Xmx | MetaspaceSize | MaxMetaspaceSize | R/W split enabled | |||||||||||||||||||
pcp1-pvt | |||||||||||||||||||||||||||||
mod-remote-storage | 10(11)* | 3.0.0 | 2 | 4920 | 4472 | 1024 | 3960 | 512 | 512 | FALSE | |||||||||||||||||||
mod-data-import | 18(20)* | 3.0.7 | 1 | 2048 | 1844 | 256 | 1292 | 384 | 512 | FALSE | |||||||||||||||||||
mod-source-record-storage15(18authtoken | 13(16)*5 | 2.7.3(5.7.5)*14.1 | 2 | 56001440 | 50001152 | 2048512 | 3500922 | 38488 | 512128 | FALSE | |||||||||||||||||||
mod-inventoryconfiguration | 119(14)*20.1.3(20.1.7)*10)* | 5.9.2 | 2 | 28801024 | 2592896 | 1024128 | 1814768 | 38488 | 512128 | FALSE | |||||||||||||||||||
mod-diusers-converter-storagebl | 159(1810)*2 | 7.1.2(2.1.5)*6.0 | 2 | 10241440 | 8961152 | 128512 | 768922 | 88 | 128 | FALSE | |||||||||||||||||||
mod-circulationinventory-storage | 12(1415)* | 2427.0.83(2427.0.114)* | 2 | 28804096 | 25923690 | 15362048 | 18143076 | 384 | 512 | FALSE | |||||||||||||||||||
mod-circulation-pubsubstorage | 1112(1314)* | 217.111.23(217.111.37)* | 2 | 15362880 | 14402592 | 10241536 | 9221814 | 384 | 512 | FALSE | |||||||||||||||||||
mod-source-patron-blocks | 9(10)* | 1.9.0 | 2 | 1024 | 896 | 1024 | 768 | 88 | 128record-storage | 15(18)* | 5.7.3(5.7.5)* | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE | ||||||||||
mod-source-record-manager14(17inventory | 11(14)* | 320.71.43(320.1.7.8)* | 2 | 56002880 | 50002592 | 20481024 | 35001814 | 384 | 512 | FALSE | |||||||||||||||||||
mod-di-quickconverter-marcstorage | 915(1118)* | 52.01.02(52.01.15)*1 | 2 | 22881024 | 2176896 | 128 | 1664768 | 38488 | 512128 | FALSE | |||||||||||||||||||
nginxmod-okapi | 9 | 2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | FALSE | okapi-b | 11 | 5.1.2 | 3 | 1684 | 1440 | 1024 | 922 | circulation | 12(14)* | 24.0.8(24.0.11)* | 2 | 2880 | 2592 | 1536 | 1814 | 384 | 512 | FALSE |
mod-feesfinespubsub | 1011(1113)* | 19.0.02.11.2(2.11.3)* | 2 | 10241536 | 8961440 | 1281024 | 768922 | 88384 | 128512 | FALSEpub-okapi | |||||||||||||||||||
mod-patron-blocks | 9(10)* | 20231.069.140 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | FALSE |
* - The newest version was used in this test to compare with previous test
Methodology/Approach
DI tests were started from UI with 5-minute pauses between the tests.
Additional links
Grafana dashboard:
MARC Bib Create + CICO
MARC Bib Update + CICO
...
1024 | 768 | 88 | 128 | FALSE | ||||||
mod-source-record-manager | 14(17)* | 3.7.4(3.7.8)* | 2 | 5600 | 5000 | 2048 | 3500 | 384 | 512 | FALSE |
mod-quick-marc | 9(11)* | 5.0.0(5.0.1)* | 1 | 2288 | 2176 | 128 | 1664 | 384 | 512 | FALSE |
nginx-okapi | 9 | 2023.06.14 | 2 | 1024 | 896 | 128 | 0 | 0 | 0 | FALSE |
okapi-b | 11 | 5.1.2 | 3 | 1684 | 1440 | 1024 | 922 | 384 | 512 | FALSE |
mod-feesfines | 10(11)* | 19.0.0 | 2 | 1024 | 896 | 128 | 768 | 88 | 128 | FALSE |
pub-okapi | 9 | 2023.06.14 | 2 | 1024 | 896 | 128 | 768 | 0 | 0 | FALSE |
* - The newest version was used in this test to compare with the previous test
Methodology/Approach
DI tests were started from UI with 5-minute pauses between the tests.
Additional links
Grafana dashboard: