Skip to end of banner
Go to start of banner

Data Import Splitting Feature test report (Orchid) ocp3

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 29 Next »


Overview

This document contains the results of testing Data Import Splitting Feature for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3. PERF-644 - Getting issue details... STATUS PERF-645 - Getting issue details... STATUS PERF-647 - Getting issue details... STATUS PERF-646 - Getting issue details... STATUS PERF-671 - Getting issue details... STATUS

Splitting feature documentation Detailed Release Notes for Data Import Splitting Feature

Summary

  • Duration for DI correlates with number of the records imported (100k records- 38 min, 250k - 1 hour 32 min, 500k - 3 hours 29 min).
  • ---------Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order.  Duration for Check-In/Check-Out is prolonged twice during DI.
  • This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for all of the modules.
  • Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.
  • Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

MODDATAIMP-924 - Getting issue details... STATUS

Results

Test #

Profile

Splitting Feature EnabledResults

Splitting Feature Disabled

ResultsBefore Splitting Feature releasedResults
1

100K MARC Create

PTF - Create 237 min -39 minCompleted40 minCompleted32-33 minutesCompleted
1

250K MARC Create 

PTF - Create 21 hour 32 minCompleted1 hour 41 minCompleted1 hour 33 min - 1 hour 57 minCompleted
1500K MARC CreatePTF - Create 23 hours 29 minCompleted*3 hours 55 minCompleted3 hours 33 minCompleted
2Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 22 hours 40 minCompleted*

3 hours 1 minCompleted
3CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants)PTF - Create 2



24 minCompleted *
4

100K MARC Update (Create new file)

PTF - Updates Success - 1

58 min 25 sec

57 min 19 sec

Completed1 hour 3 minCompleted--
4

250K MARC Update

PTF - Updates Success - 1

2 hours 2 min **


2 hours 12 min

Completed with errors **

Completed

1 hour 53 minCompleted--
4500K MARC UpdatePTF - Updates Success - 1

4 hours 43 min

4 hours 38 minutes

Completed

Completed

5 hour 59 minCompleted--

 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException MODDATAIMP-748 - Getting issue details... STATUS ?


 ** -  up to 10 items were discarded with the error: io.vertx.core.impl.NoStackTraceThrowable: Cannot get actual Item by id: org.folio.inventory.exceptions.InternalServerErrorException: Access for user 'data-import-system-user' (f3486d35-f7f7-4a69-bcd0-d8e5a35cb292) requires permission: inventory-storage.items.item.get



Data Import Robustness Enhancement  PERF-646 - Getting issue details... STATUS

25K records RECORDS_PER_SPLIT_FILE
Number of concurent tenantsJob profile 500
1K
5K
10K
1 TenantPTF - Create 2

















2 TenantsPTF - Create 2

















3 TenantsPTF - Create 2



















Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants

Test#3Duration with DIDuration without DI
Check-In1.1380.517
Check-Out1.5520.796
Test#3DI Duration with CI/CODI Duration without CI/CO*
Tenant _120 min14 min (18 min for run 2)
Tenant _219 min16 min (18 min for run 2)
Tenant _316 min16 min (15 min for run 2)

 * - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k

Memory Utilization

This has memory utilization increasing due to previous modules restarting (everyday cluster shot down process) no memory leak is suspected for DI modules.

MARC BIB CREATE

Test#1 100k, 250k, 500k records DI

Test#2 Multitenant  DI (9 concurrent jobs)


Test#3 With CI/CO


Service CPU Utilization 

MARC BIB CREATE

Average CPU usage for mod-inventory -was 144%, mod-di-converter-storage was about 107%, and for all other modules did not exceed 100 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 260%.

Test#1 500k records DI


Test#2 Multitenant


Test#3 With CI/CO


Instance CPU Utilization

Test#1 500k records DI

Test#2 Multitenant DI (9 concurrent jobs)



RDS CPU Utilization 

MARC BIB CREATE

Approximately DB CPU usage is up to 95%

Test#1  500k records DI

Test#2 Multitenant  DI (9 concurrent jobs)


Test#3 With CI/CO


RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 535 connections count.

Test#1  500k records DI

Test#2 Multitenant


Test#3 With CI/CO


Appendix

Infrastructure ocp3

Records count :

  • tenant0_mod_source_record_storage.marc_records_lb = 9674629
  • tenant2_mod_source_record_storage.marc_records_lb = 0
  • tenant3_mod_source_record_storage.marc_records_lb = 0
  • tenant0_mod_source_record_storage.raw_records_lb = 9604805
  • tenant2_mod_source_record_storage.raw_records_lb = 0
  • tenant3_mod_source_record_storage.raw_records_lb = 0
  • tenant0_mod_source_record_storage.records_lb = 9674677
  • tenant2_mod_source_record_storage.records_lb = 0
  • tenant3_mod_source_record_storage.records_lb = 0
  • tenant0_mod_source_record_storage.marc_indexers =  620042011
  • tenant2_mod_source_record_storage.marc_indexers =  0
  • tenant3_mod_source_record_storage.marc_indexers =  0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 010 = 3285833
  • tenant2_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 010 = 0
  • tenant0_mod_source_record_storage.marc_indexers with field_no 035 = 19241844
  • tenant2_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant3_mod_source_record_storage.marc_indexers with field_no 035 = 0
  • tenant0_mod_inventory_storage.authority = 4
  • tenant2_mod_inventory_storage.authority = 0
  • tenant3_mod_inventory_storage.authority = 0
  • tenant0_mod_inventory_storage.holdings_record = 9592559
  • tenant2_mod_inventory_storage.holdings_record = 16
  • tenant3_mod_inventory_storage.holdings_record = 16
  • tenant0_mod_inventory_storage.instance = 9976519
  • tenant2_mod_inventory_storage.instance = 32
  • tenant3_mod_inventory_storage.instance = 32 
  • tenant0_mod_inventory_storage.item = 10787893
  • tenant2_mod_inventory_storage.item = 19
  • tenant3_mod_inventory_storage.item = 19

PTF -environment ocp3 

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

Before Splitting Feature released

Module
ocp3-pvt
Mon Sep 11 09:33:28 UTC 2023
Task Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
mod-remote-storage13mod-remote-storage:2.0.324920447210243960512512false
mod-agreements8mod-agreements:5.5.2215921488128968384512false
mod-data-import7mod-data-import:2.7.11204818442561292384512false
mod-search30mod-search:2.0.1225922480204814405121024false
mod-authtoken7mod-authtoken:2.13.021440115251292288128false
mod-configuration7mod-configuration:5.9.12102489612876888128false
mod-inventory-storage1mod-inventory-storage:26.1.0-SNAPSHOT.66502208195210241440384512false
mod-circulation-storage15mod-circulation-storage:16.0.122880259215361814384512false
mod-source-record-storage11mod-source-record-storage:5.6.725600500020483500384512false
mod-calendar7mod-calendar:2.4.22102489612876888128false
mod-inventory12mod-inventory:20.0.622880259210241814384512false
mod-circulation9mod-circulation:23.5.622880259215361814384512false
mod-di-converter-storage8mod-di-converter-storage:2.0.52102489612876888128false
mod-pubsub8mod-pubsub:2.9.12153614401024922384512false
mod-users8mod-users:19.1.12102489612876888128false
mod-patron-blocks8mod-patron-blocks:1.8.021024896102476888128false
mod-source-record-manager9mod-source-record-manager:3.6.425600500020483500384512false
nginx-edge7nginx-edge:2023.06.1421024896128000false
mod-quick-marc7mod-quick-marc:3.0.01228821761281664384512false
nginx-okapi7nginx-okapi:2023.06.1421024896128000false
okapi-b8okapi:5.0.13168414401024922384512false
mod-feesfines7mod-feesfines:18.2.12102489612876888128false
mod-patron7mod-patron:5.5.22102489612876888128false
mod-notes7mod-notes:5.0.121024896128952384512false
pub-okapi7pub-okapi:2023.06.142102489612876800false


Service versions for Splitting Feature test

Module
ocp3-pvt
Mon Sep 25 12:43:06 UTC 2023
Task Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
mod-data-import10mod-data-import:2.7.2-SNAPSHOT.1371204818442561292384512false
mod-search30mod-search:2.0.1225922480204814405121024false
mod-configuration8mod-configuration:5.9.12102489612876888128false
mod-bulk-operations7mod-bulk-operations:1.0.623072260010241536384512false
mod-inventory-storage1mod-inventory-storage:26.1.0-SNAPSHOT.66502208195210241440384512false
mod-circulation-storage15mod-circulation-storage:16.0.122880259215361814384512false
mod-source-record-storage12mod-source-record-storage:5.6.725600500020483500384512false
mod-calendar7mod-calendar:2.4.22102489612876888128false
mod-inventory12mod-inventory:20.0.622880259210241814384512false
mod-circulation9mod-circulation:23.5.622880259215361814384512false
mod-di-converter-storage8mod-di-converter-storage:2.0.52102489612876888128false
mod-pubsub9mod-pubsub:2.9.12153614401024922384512false
mod-users9mod-users:19.1.12102489612876888128false
mod-patron-blocks9mod-patron-blocks:1.8.021024896102476888128false
mod-source-record-manager12mod-source-record-manager:3.6.5-SNAPSHOT.24525600500020483500384512false
mod-quick-marc7mod-quick-marc:3.0.01228821761281664384512false
nginx-okapi7nginx-okapi:2023.06.1421024896128000false
okapi-b8okapi:5.0.13168414401024922384512false
mod-feesfines8mod-feesfines:18.2.12102489612876888128false
mod-notes7mod-notes:5.0.121024896128952384512false
pub-okapi7pub-okapi:2023.06.142102489612876800false


Methodology/Approach

To set splitting feature: Detailed Release Notes for Data Import Splitting Feature

Test 1: Manually tested 100k, 250k, and 500k records files started one by one on one tenant only.

Test 2: Manually tested 100k+50k+1 record files DI started simultaneously on every 3 tenants (9 jobs total).

Test 3: Run CICO on one tenant, DI jobs 3 tenants, including the one that runs CICO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. CICO: 20 users, DI file size: 25k


  • No labels