PTF - Data Export Test Report (Orchid)

PTF - Data Export Test Report (Orchid)

Overview

This document contains the results of testing Data Export (MARC BIB) on Orchid release with baseline Data Export tests.

Ticket: https://folio-org.atlassian.net/browse/PERF-669



Summary

  • Data export jobs duration has no degradation for all DE files (1k, 100k, 500k). 

  • Maximum CPU utilization was observed for mod-data-export module during DE job with 500k of instances - 10%.

    • average utilization for mod-data-export:

      • Service CPU Utilization: 10%

      • Memory Utilization: 72%

      • DB CPU Utilization: 18%

      • DB Connections: 165

Recommendations & Jiras

Test Results

Profile used for testing - "srs - holdings and items"

Test

File

Duration, 1 set of tests

Duration, 2 set of tests

Test

File

Duration, 1 set of tests

Duration, 2 set of tests

1

1k

30s

27s

2

100k

48m 22s

47m 51s

3

500k

3h 53m 22s

4h 0m 26s

Instance CPU Utilization

Service CPU Utilization

Average mod-data-export: 10%

Memory Utilization

Average mod-data-export: 72%

DB CPU Utilization

Average mod-data-export: 18%

DB Connections

Average: 165

DB Load

SQL queries

Top-SQL statement: 

SELECT fs09000000_mod_inventory_storage.count_estimate(?)

Additional information

In UI all jobs have status - completed with column 'Failed' value equal to '-1' for 1k and 100k and '-12' for 500k.

In DB we can see that exported value higher than total and that's why column for 'Failed' shows negative value.

"status": "COMPLETED",
  "progress": {
    "total": 500000,
    "failed": -12,
    "exported": 500012
  }

Methodology/Approach

To get Baseline numbers for Data Export in main tenant with 1 user 3 files with instance ids were used.

To get status and time range for export jobs the query used: 

SELECT jsonb->>'status',jsonb->>'startedDate' AS startedDate,jsonb->>'completedDate' AS completedDate
FROM [tenant_id]_mod_data_export.job_executions
WHERE jsonb->>'jobProfileName'='srs - holdings and items'
ORDER BY jsonb->>'startedDate' desc LIMIT 10;

Test preparation: 

  • 3 files were prepared with query: SELECT id FROM [tenant_id]_mod_inventory_storage.instance where jsonb->>'source'='MARC' LIMIT 1000|100000|500000;

  • All tests were carried out sequentially

Infrastructure

PTF -environment ncp5 

  • 9 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1  

  • 2 database  instances, one reader, and one writer

  • number of connections for mod-source-record-manager and mod-source-record-storage - 30 connections.

  • MSK ptf-kakfa-3

    • 4 m5.2xlarge brokers in 2 zones

    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true

    • log.retention.minutes=480

    • default.replication.factor=3

  • Kafka topics partitioning: - 2 partitions for DI topics

Modules memory and CPU parameters

Module

Task Def. Revision

Version

Task Count

Mem Hard Limit

Mem Soft limit

CPU units

Xmx

MetaspaceSize

MaxMetaspaceSize

R/W split enabled

ncp5-pvt

Mon Sep 18 10:17:13 UTC 2023

mod-authtoken

8

2.13.0

2

1440

1152

512

922

88

128

FALSE

mod-users-bl

8

7.5.0

2

1440

1152

512

922

88

128

FALSE

mod-inventory-storage

12

26.0.0

2

4096

3690

2048

3076

384

512

FALSE

mod-source-record-storage

27

5.6.7

2

5600

5000

2048

3500

384

512

FALSE

mod-source-record-manager

18

3.6.4

2

5600

5000

2048

3500

384

512

FALSE

okapi-b

8

5.0.1

3

1684

1440

1024

922

384

512

FALSE

mod-data-export

6

4.7.1

1

1024

896

1024

768

88

128

FALSE