Data Import test report (Orchid) baseline for ocp3


Overview

This document contains the results of testing Data Import for MARC Bibliographic records in the Orchid release to detect the baseline for ocp3. PERF-662 - Getting issue details... STATUS  

Summary

  • Duration for DI correlates with number of the records imported (100k records- 32 min, 250k - 1 hour 33 min, 500k - 3 hours 33 min). Multitenant DI could be performed successfully for up to 9 jobs in parallel. If jobs are big they will start one by one in order for each tenant but processed in parallel on 3 tenants. Small DI (1 record) could be finished faster not in order.  Response time for Check-In/Check-Out is prolonged twice (for Check-In from 0.517s to 1.138s, for Check-Out from 0.796s to 1.552s) during DI.
  • The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.
  • Average CPU usage for the test with 500k records Created for mod-di-converter-storage was about 462%, and for all other modules did not exceed 150 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 400%.
  • Approximately DB CPU usage is up to 95%.

Recommendations and Jiras

It is recommended to increase CPU units for mod-di-converter-storage to 512.

Results

Test #

Profile

Duration

ocp3

Results
1

100K MARC Create

PTF - Create 232-33 minutesCompleted
1

250K MARC Create 

PTF - Create 21 hour 33 min - 1 hour 57 minCompleted
1500K MARC CreatePTF - Create 23 hours 33 minCompleted
2Multitenant MARC Create (100k, 50k, and 1 record)PTF - Create 23 hours 1 minCompleted
3CI/CO + DI MARC Create (20 users CI/CO, 25k records DI on 3 tenants)PTF - Create 224 minCompleted *


 * - One record on one tenant could be discarded with error: io.netty.channel.StacklessClosedChannelException

Test #3 With CI/CO 20 users and DI 25k records on each of the 3 tenants

Test#3CI/CO Response Time with DICI/CO Response Time  without DI
Check-In1.138 s0.517 s
Check-Out1.552 s0.796 s
Test#3DI Duration with CI/CODI Duration without CI/CO*
Tenant _120 min14 min (18 min for run 2)
Tenant _219 min16 min (18 min for run 2)
Tenant _316 min16 min (15 min for run 2)

 * - Same approach testing DI: 3 DI jobs total on 3 tenants without CI/CO. Start the second job after the first one reaches 30%, and start another job on a third tenant after the first job reaches 60% completion. DI file size: 25k

Memory Utilization

The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.

MARC BIB CREATE

Test#1 100k, 250k, 500k records DI

Test#2 Multitenant  DI (9 concurrent jobs)

Test#3 With CI/CO

Service CPU Utilization 

MARC BIB CREATE

Average CPU usage for the test with 500k records Created for mod-di-converter-storage was about 462%, and for all other modules did not exceed 150 %. We can observe spikes in CPU usage of mod-data-import at the beginning of the Data Import jobs up to 400%.

Test#1  250k, 500k records DI

Test#2 Multitenant

Test#3 With CI/CO

Instance CPU Utilization

Test#1  250k, 500k records DI

Test#2 Multitenant DI (9 concurrent jobs)

RDS CPU Utilization 

MARC BIB CREATE

Approximately DB CPU usage is up to 95%

Test#1  250k, 500k records DI

Test#2 Multitenant  DI (9 concurrent jobs)

Test#3 With CI/CO

RDS Database Connections

MARC BIB CREATE
 For DI  job Create- Maximum 520 connections count.

Test#1  250k, 500k records DI

Test#2 Multitenant

Test#3 With CI/CO