Data Import test report - NLA


Overview

This document contains the results of testing Data Import for NLA. PERF-599 - Getting issue details... STATUS

Summary

  • Job duration differs up to 1,5 minute for importing 5k-10k records for different test runs (from 8 min 48 sec to 10 min 19 sec ).
  • Memory utilization was stable no memory leak is suspected for all of the modules.
  • Average CPU usage did not exceed 130 % for all modules.
  • Approximately DB CPU usage is up to 97%.

Recommendations and Jiras


Investigate the cases where instances can have status code: 404 for data import. PERF-609 - Getting issue details... STATUS

Results

Files Records number

Profile

Test 1 duration

Test 2 duration

Test 3 duration

For_LA_edeposit_updates_1k.mrc

1000LMS LA edeposit records update53 sec56 sec1 min 13 sec
For_LA_edeposit_updates_5k.mrc5000LMS LA edeposit records update5 min 44 sec4 min 57 sec6 min 9 sec

For_LA_edeposit_updates_10k.mrc

10000LMS LA edeposit records update9 min 7 sec8 min 48 sec10 min 19 sec
For_DISC_HRID_match_1.mrc1DISC HRID match2 sec2 sec2 sec
For_DISC_HRID_match_12.mrc12DISC HRID match2 sec 3 sec3 sec
For_DISC_HRID_match_251.mrc251DISC HRID match12 sec15 sec12 sec
For_DISC_HRID_match_1k.mrc1000DISC HRID match43 sec1 min47 sec
For_DISC_NewNonEdepositRecords_5.mrc5DISC New NON edeposit records3 sec3 sec2 sec

NewEDepositRecords_13.mrc

13DISC New edeposit records3 sec3 sec3 sec
NewEDepositRecords_54.mrc54DISC New edeposit records5 sec7 sec4 sec
NewEDepositRecords_74.mrc74DISC New edeposit records5 sec7 sec4 sec
NewEDepositRecords_77.mrc77DISC New edeposit records6 sec9 sec4 sec
NewEDepositRecords_100.mrc100DISC New edeposit records13 sec11 sec5 sec
NewEDepositRecords_200.mrc200DISC New edeposit records13 sec17 sec8 sec

 * - Jobs order in the table corresponds to the jobs order on the graphs. on the graphs marked by record numbers.

Memory Utilization

Memory utilization increased for mod-source-record-storage by 1% (from 48% to 49%) at the beginning of 10000 records update. All other modules' CPU utilization was stable.

Service CPU Utilization 

Average CPU usage did not exceed  91% for all modules.

Instance CPU Utilization

Average CPU usage did not exceed  24%.

RDS CPU Utilization 

Approximately DB CPU usage is up to 97%


RDS Database Connections

Maximum 490 connections count.

Appendix

Infrastructure

Records count :

  • mod_source_record_storage.marc_records_lb = 7300919
  • mod_source_record_storage.raw_records_lb = 7300919
  • mod_source_record_storage.records_lb = 7300919
  • mod_source_record_storage.marc_indexers = 245032159 (all records)
  • mod_source_record_storage.marc_indexers with field_no 010 = 1008129
  • mod_source_record_storage.marc_indexers with field_no 035 = 8968420
  • mod_inventory_storage.authority = 852215
  • mod_inventory_storage.holdings_record = 6091403
  • mod_inventory_storage.instance = 5581816
  • mod_inventory_storage.item = 5705915

PTF -environment  - nptf 

  • 8 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 1 database instance, writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics


Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

mod-inventory-storage26.0.0121024220819523841440
mod-inventory20.0.4321024288025925121814
mod-source-record-storage5.6.6521024409636885123076
mod-quick-marc3.0.011128228821765121664
mod-source-record-manager3.6.352
1024
4096
3688
5123076
mod-di-converter-storage2.0.2221281024896128768
mod-data-import2.7.111256204818445121292
okapi5.0.123102416841440512922
nginx-okapi2022.03.02121281024896--
pub-okapi2022.03.02121281024896-768

Methodology/Approach

To test Baseline for DI JMeter scripts were used.

  • 5 min pauses between the tests

Test was repeated 3 times.