Data Import with Check-ins Check-outs (Orchid)
Overview
This document contains the results of testing Check-in/Check-out and Data Import for MARC Bibliographic records in the Orchid release to detect performance trends.
Ticket: - PERF-472Getting issue details... STATUS
Summary
- There is a significant performance degradation of data import for Orchid in comparison to Nolana results. Response time are about 2 times higher. It might be due to fixing differences in the database schemas. More details...
- Data import response times are up to 22% higher with parallel Check-in/Check-out than pure Data import results. More details...
- Check-in/Check-out response times are up to 168% higher with parallel Data import than pure Check-in/Check-out results. More details...
- There is memory utilization increase observed which is caused by previous modules restarting (everyday cluster shut down process). More details...
- Average CPU usage did not exceed 130 % for all the modules. Spikes can be observed in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 170%. CPU usage is about 2 times lower than in the same test for Nolana. More details...
- Average DB CPU usage during data import is about 95%. More details...
Test Runs
Test # | Scenario | Load level | Comment |
---|---|---|---|
1 | DI MARC Bib Create | 5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause) | |
CICO | 8 users | ||
2 | DI MARC Bib Update | 5K, 10K, 25K, 50K, 100K consequentially (with 5 min pause) | 100k file was completed with errors (1 item discarded) |
CICO | 8 users |
Test Results
Data import
Profile | MARC File | DI Duration | Check In, Check Out Response time (8 users) | |
---|---|---|---|---|
CI Average sec | CO Average sec | |||
DI MARC Bib Create (PTF - Create 2) | 5K.mrc | 00:05:01.733 | 0.961 | 1.442 |
10K.mrc | 00:09:06.752 | 1.058 | 1.624 | |
25K.mrc | 00:24:28.167 | 1.056 | 1.621 | |
50K.mrc | 00:43:03.785 | 0.936 | 1.519 | |
100K.mrc | 01:35:50.749 | 0.868 | 1.468 | |
DI MARC Bib Update (PTF - Updates Success - 1) | 5K.mrc | 00:04:52.496 | 0.855 | 1.339 |
10K.mrc | 00:09:22.765 | 0.916 | 1.398 | |
25K.mrc | 00:24:02.238 | 0.922 | 1.425 | |
50K.mrc | 00:47:13.876 | 0.904 | 1.456 | |
100K.mrc | 01:40:25.533 | 0.838 | 1.415 |
Check-in/Check-out
Scenario | Load level | Request | Response time, sec | |
---|---|---|---|---|
95 perc | average | |||
Circulation Check-in/Check-out (without Data import) | 8 users | Check-in | 0.489 | 0.394 |
Check-out | 0.793 | 0.724 |
Comparison
The following table compares test results of current release (Orchid) to the previous release numbers (Nolana) and to the baselines Orchid results (CICO without DI and DI without CICO).
File size | Data import duration | Deviation (compared DI Orchid without CICO and with CICO) | Check-in/Check-out 8 users response time with Data Import (avg, sec) | Deviation (compared CICO Orchid without DI and with DI) | |||||||
without CICO | with CICO (8 users) | Check-in | Check-out | Check-in | Check-out | ||||||
Nolana* | Orchid** | Nolana* | Orchid | Nolana* | Orchid | Nolana* | Orchid | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
5K MARC BIB Create | 2 min 51 sec | 4 min 30 sec | 1 min 56 sec | 5 min 01 sec | +11% | 0.817 | 0.961 | 1.417 | 1.442 | +143% | +96% |
5K MARC BIB Update | 2 min 27 sec | 4 min 2 sec | 2 min 51 sec | 4 min 52 sec | +20% | 0.747 | 0.855 | 1.094 | 1.339 | +117% | +84% |
10K MARC BIB Create | 4 min 55 sec | 9 min 25 sec | 3 min 57 sec | 9 min 06 sec | -3% | 0.842 | 1.058 | 1.574 | 1.624 | +168% | +124% |
10K MARC BIB Update | 4 min 50 sec | 8 min 10 sec | 4 min 57 sec | 9 min 22 sec | +14% | 0.541 | 0.916 | 1.026 | 1.398 | +132% | +93% |
25K MARC BIB Create | 11 min 56 sec | 22 min 16 sec | 9 min 24 sec | 24 min 28 sec | +9% | 0.882 | 1.056 | 1.641 | 1.621 | +168% | +123% |
25K MARC BIB Update | 12 min 20 sec | 19 min 39 sec | 13 min 12 sec | 24 min 2 sec | +22% | 0.700 | 0.922 | 1.248 | 1.425 | +134% | +96% |
50K MARC BIB Create | 23 min 43 sec | 39 min 27 sec | 19 min 28 sec | 43 min 3 sec | +9% | 0.926 | 0.936 | 1.666 | 1.519 | +137% | +109% |
50K MARC BIB Update | 24 min 5 sec | 38 min 30 sec Completed with errors (1 item discarded) | 27 min 39 sec | 47 min 13 sec | +22% | 0.700 | 0.904 | 1.199 | 1.456 | +129% | +101% |
100K MARC BIB Create | 49 min 40 sec | 1 hour 38 min | 38 min 44 sec | 1 hour 35 min | -3% | 1.021 | 0.868 | 1.862 | 1.468 | +120% | +102% |
100K MARC BIB Update | 51 min 15 sec | 1 hour 33 min | 48 min 45 sec | 1 hour 40 min Completed with errors (1 item discarded) | +7% | 0.556 | 0.838 | 1.046 | 1.415 | +112% | +95% |
* Nolana DI and CICO results are taken from Data Import with Check-ins Check-outs Nolana.
** Orchid DI results are taken from Data Import test report (Orchid).
Detailed CICO response time comparison
Request* | Response time (avg, sec) | ||
---|---|---|---|
Pure CICO | CICO + 100K MARC BIB Create | CICO + 100K MARC BIB Update | |
Check-Out Controller | 0.724 | 1.470 | 1.415 |
Check-In Controller | 0.394 | 0.870 | 0.838 |
POST_circulation/check-out-by-barcode (Submit_barcode_checkout) | 0.251 | 0.532 | 0.507 |
POST_circulation/check-in-by-barcode (Submit_barcode_checkin) | 0.173 | 0.402 | 0.390 |
GET_circulation/loans (Submit_barcode_checkout) | 0.127 | 0.278 | 0.276 |
GET_users (Get_check_in_page) | 0.022 | 0.092 | 0.093 |
GET_inventory/items (Submit_barcode_checkin) | 0.045 | 0.113 | 0.113 |
GET_inventory/items (Submit_barcode_checkout) | 0.048 | 0.112 | 0.114 |
GET_automated-patron-blocks (Submit_patron_barcode) | 0.021 | 0.049 | 0.047 |
GET_configurations/entries (Get_check_in_page) | 0.013 | 0.044 | 0.040 |
GET_users (Submit_patron_barcode) | 0.015 | 0.040 | 0.039 |
GET_circulation/requests_status_openAwaitingPickup (Submit_patron_barcode) | 0.018 | 0.035 | 0.033 |
*Top-10 requests were taken for analysis.
Response time
DI MARC BIB Create + CICO
DI Bib Update + CICO
Service CPU Utilization
Average CPU usage did not exceed 130 % for all the modules. Spikes can be observed in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 170%.
CPU usage is about 2 times lower than in the same test for Nolana - Data Import with Check-ins Check-outs Nolana.
DI MARC BIB Create + CICO
MARC BIB Update + CICO
Service Memory Utilization
There is memory utilization increasing observed which is caused by previous modules restarting (everyday cluster shut down process).
DI MARC BIB Create + CICO
MARC BIB Update + CICO
DB CPU Utilization
Average DB CPU usage during data import is about 95%
DI MARC BIB Create + CICO
MARC BIB Update + CICO
DB Connections
Average connection count during data import is about 270 connections.
DI MARC BIB Create + CICO
MARC BIB Update + CICO
DB load
DI MARC BIB Create + CICO
Top SQL-queries:
update "marc_records_lb" set "content" = cast($1 as jsonb) where "id" = cast($2 as uuid)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
RDS log file:
MARC BIB Update + CICO
Top SQL-queries:
insert into "marc_records_lb" ("id", "content") values (cast($1 as uuid), cast($2 as jsonb)) on conflict ("id") do update set "content" = cast($3 as jsonb)
INSERT INTO fs09000000_mod_source_record_manager.journal_records (id, job_execution_id, source_id, source_record_order, entity_type, entity_id, entity_hrid, action_type, action_status, error, action_date, title, instance_id, holdings_id, order_id) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)
INSERT INTO fs09000000_mod_source_record_manager.events_processed (handler_id, event_id) VALUES ($1, $2)
RDS log file:
Appendix
Infrastructure
Records count:
- mod_source_record_storage.marc_records_lb = 22618121
- mod_source_record_storage.raw_records_lb = 22650140
- mod_source_record_storage.records_lb = 22650140
- mod_source_record_storage.marc_indexers = 98256911(all records)
- mod_source_record_storage.marc_indexers with field_no 010 = 139135
- mod_source_record_storage.marc_indexers with field_no 035 = 4272473
- mod_inventory_storage.authority = 7402975
- mod_inventory_storage.holdings_record = 22027125
- mod_inventory_storage.instance = 20986866
- mod_inventory_storage.item = 22130108
PTF -environment ncp3
- 9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
Modules memory and CPU parameters
Modules | Version | Task Definition | Running Tasks | CPU | Memory | MemoryReservation | MaxMetaspaceSize | Xmx |
---|---|---|---|---|---|---|---|---|
mod-inventory-storage | 26.0.0 | 10 | 2 | 1024 | 2208 | 1952 | 512 | 1440 |
mod-inventory | 20.0.4 | 8 | 2 | 1024 | 2880 | 2592 | 512 | 1814 |
mod-source-record-storage | 5.6.5 | 24 | 2 | 2048 | 4096 | 3688 | 512 | 3076 |
mod-quick-marc | 3.0.0 | 5 | 1 | 128 | 2288 | 2176 | 512 | 1664 |
mod-source-record-manager | 3.6.2 | 16 | 2 | 1024 | 4096 | 3688 | 512 | 3076 |
mod-di-converter-storage | 2.0.2 | 5 | 2 | 128 | 1024 | 896 | 128 | 768 |
mod-data-import | 2.7.1 | 8 | 1 | 256 | 2048 | 1844 | 512 | 1292 |
okapi | 5.0.1 | 6 | 3 | 1024 | 1684 | 1440 | 512 | 922 |
nginx-okapi | 2022.03.02 | 6 | 2 | 128 | 1024 | 896 | - | - |
pub-okapi | 2022.03.02 | 6 | 2 | 128 | 1024 | 896 | - | 768 |
mod-feesfines | 18.2.1 | 1 | 2 | 128 | 1024 | 896 | 128 | 768 |
mod-patron-blocks | 1.8.0 | 1 | 2 | 1024 | 1024 | 896 | 128 | 768 |
mod-pubsub | 2.7.0 | 1 | 2 | 1024 | 1536 | 1440 | 512 | 922 |
mod-authtoken | 2.12.0 | 1 | 2 | 512 | 1440 | 1152 | 128 | 922 |
mod-circulation-storage | 15.0.2 | 1 | 2 | 1024 | 1536 | 1440 | 512 | 896 |
mod-circulation | 23.3.2 | 1 | 2 | 1024 | 1024 | 896 | 128 | 768 |
mod-configuration | 5.9.0 | 1 | 2 | 128 | 1024 | 896 | 128 | 768 |
mod-users | 19.0.0 | 1 | 2 | 128 | 1024 | 896 | 128 | 768 |
mod-remote-storage | 1.7.1 | 1 | 2 | 128 | 1872 | 1692 | 512 | 1178m |
Methodology/Approach
To test Baseline for DI JMeter scripts were used with 5 min pauses between the tests.
Additional links
Grafana dashboard:
MARC Bib Create + CICO
MARC Bib Update + CICO