PTF - Data Export Test Report (Orchid)

Overview

This document contains the results of testing Data Export (MARC BIB) on Orchid release with baseline Data Export tests.

Ticket: PERF-669 - Getting issue details... STATUS


Summary

  • Data export jobs duration has no degradation for all DE files (1k, 100k, 500k). 
  • Maximum CPU utilization was observed for mod-data-export module during DE job with 500k of instances - 10%.
    • average utilization for mod-data-export:
      • Service CPU Utilization: 10%
      • Memory Utilization: 72%
      • DB CPU Utilization: 18%
      • DB Connections: 165

Recommendations & Jiras

  • Investigate negative values in 'Failed' column in UI PERF-676 - Getting issue details... STATUS

Test Results

Profile used for testing - "srs - holdings and items"

Test

File

Duration, 1 set of tests

Duration, 2 set of tests

1

1k30s27s
2100k48m 22s47m 51s
3500k3h 53m 22s4h 0m 26s

Instance CPU Utilization

Service CPU Utilization

Average mod-data-export: 10%

Memory Utilization

Average mod-data-export: 72%

DB CPU Utilization

Average mod-data-export: 18%

DB Connections

Average: 165

DB Load

SQL queries

Top-SQL statement: 

SELECT fs09000000_mod_inventory_storage.count_estimate(?)

Additional information

In UI all jobs have status - completed with column 'Failed' value equal to '-1' for 1k and 100k and '-12' for 500k.

In DB we can see that exported value higher than total and that's why column for 'Failed' shows negative value.

"status": "COMPLETED",
  "progress": {
    "total": 500000,
    "failed": -12,
    "exported": 500012
  }

Methodology/Approach

To get Baseline numbers for Data Export in main tenant with 1 user 3 files with instance ids were used.

To get status and time range for export jobs the query used: 

SELECT jsonb->>'status',jsonb->>'startedDate' AS startedDate,jsonb->>'completedDate' AS completedDate
FROM [tenant_id]_mod_data_export.job_executions
WHERE jsonb->>'jobProfileName'='srs - holdings and items'
ORDER BY jsonb->>'startedDate' desc LIMIT 10;

Test preparation: 

  • 3 files were prepared with query: SELECT id FROM [tenant_id]_mod_inventory_storage.instance where jsonb->>'source'='MARC' LIMIT 1000|100000|500000;
  • All tests were carried out sequentially

Infrastructure

PTF -environment ncp5 

  • 9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1  
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731
  • number of connections for mod-source-record-manager and mod-source-record-storage - 30 connections.
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics

Modules memory and CPU parameters

ModuleTask Def. RevisionVersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
ncp5-pvt
Mon Sep 18 10:17:13 UTC 2023
mod-authtoken82.13.021440115251292288128FALSE
mod-users-bl87.5.021440115251292288128FALSE
mod-inventory-storage1226.0.024096369020483076384512FALSE
mod-source-record-storage275.6.725600500020483500384512FALSE
mod-source-record-manager183.6.425600500020483500384512FALSE
okapi-b85.0.13168414401024922384512FALSE
mod-data-export64.7.111024896102476888128FALSE