Table of Contents |
---|
Overview
This document contains the results of testing Data Import for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3.
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
- Duration for DI correlates with number of the records imported (100k records- 32 min, 250k - 1 hour 33 min, 500k - 3 hours 33 min). Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order. Duration Response time for Check-In/Check-Out is prolonged twice (for Check-In from 0.517s to 1.138s, for Check-Out from 0.796s to 1.552s) during DI.
- This has The increase in memory utilization increasing was due to previous modules restarting (everyday cluster shot down process) the scheduled cluster shutdown. no memory leak is suspected for all of the DI modules.
- Average CPU usage for the test with 500k records Created for mod-di-converter-storage was about 462%, and for all other modules did not exceed 150 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 400%.
- Approximately DB CPU usage is up to 95%.
...
Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants
Test#3 | Duration CI/CO Response Time with DI | Duration CI/CO Response Time without DI |
---|---|---|
Check-In | 1.138 s | 0.517 s |
Check-Out | 1.552 s | 0.796 s |
Test#3 | DI Duration with CI/CO | DI Duration without CI/CO* |
---|---|---|
Tenant _001 | 20 min | 14 min (18 min for run 2) |
Tenant _022 | 19 min | 16 min (18 min for run 2) |
Tenant _033 | 16 min | 16 min (15 min for run 2) |
* - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k
Memory Utilization
This has The increase in memory utilization increasing was due to previous modules restarting (everyday cluster shot down process) the scheduled cluster shutdown. no memory leak is suspected for DI modules.
...
Service CPU Utilization
MARC BIB CREATE
Average CPU usage for the test with 500k records Created for mod-di-converter-storage was about 462%, and for all other modules did not exceed 150 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 400%.
Test#1 250k, 500k records DI
...
- tenant0_mod_source_record_storage.marc_records_lb = 9674629
- tenant2_mod_source_record_storage.marc_records_lb = 0
- tenant3_mod_source_record_storage.marc_records_lb = 0
- tenant0_mod_source_record_storage.raw_records_lb = 9604805
- tenant2_mod_source_record_storage.raw_records_lb = 0
- tenant3_mod_source_record_storage.raw_records_lb = 0
- tenant0_mod_source_record_storage.records_lb = 9674677
- tenant2_mod_source_record_storage.records_lb = 0
- tenant3_mod_source_record_storage.records_lb = 0
- tenant0_mod_source_record_storage.marc_indexers = 620042011
- tenant2_mod_source_record_storage.marc_indexers = 0
- tenant3_mod_source_record_storage.marc_indexers = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
- tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
- tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
- tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
- tenant0_mod_inventory_storage.authority = 4
- tenant2_mod_inventory_storage.authority = 0
- tenant3_mod_inventory_storage.authority = 0
- tenant0_mod_inventory_storage.holdings_record = 9592559
- tenant2_mod_inventory_storage.holdings_record = 16
- tenant3_mod_inventory_storage.holdings_record = 16
- tenant0_mod_inventory_storage.instance = 9976519
- tenant2_mod_inventory_storage.instance = 32
- tenant3_mod_inventory_storage.instance = 32
- tenant0_mod_inventory_storage.item = 10787893
- tenant2_mod_inventory_storage.item = 19
- tenant3_mod_inventory_storage.item = 19
PTF -environment ocp3
- 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
2 database instances, one reader, and one writer
Name API Name Memory GIB vCPUs max_connections R6G Extra Large db.r6g.xlarge 32 GiB 4 vCPUs 2731 - MSK ptf-kakfa-3
- 4 m5.2xlarge brokers in 2 zones
Apache Kafka version 2.8.0
EBS storage volume per broker 300 GiB
- auto.create.topics.enable=true
- log.retention.minutes=480
- default.replication.factor=3
- Kafka topics partitioning: - 2 partitions for DI topics
...
Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users- 2 hours, DI file size: 25k. In Addition to test #3 run DI with the same pattern on 3 tenants but without CI/CO.