PTF - Data Export Test Report (Poppy)

Overview

This document contains the results of testing Data Export (MARC BIB) on Poppy release with Data Export tests for 1k, 100k, 500k. Three csv files were prepared to run Data Export with Default instances export job profile and srs - holdings and items job profiles. Because of some processes were running on background during tests it was decided to run additional set of tests with 1k and 100k for each job profile. So graphs in the report show only second set of tests.

Ticket: PERF-748 - Getting issue details... STATUS


Summary

  • DE jobs with files 1k, 100k, 500k durations has no significant changes compared with Orchid. No issues with token. DE jobs with 500k were completed with FAIL statuses in a both profiles.
  • FAIL status for DE srs - holdings and items job profile with 500k file is likely connected with too high data volume transfer to S3 bucket. It will be investigated in story created by firebird team.
  • Average CPU utilization for mod-data-export didn't exceed 18% with spikes. During 100k it was 10-12%. 
  • Average Memory consumption for mod-data-export was close to 100%. 
  • Average DB utilization - 18%. DB connections - 200. During tests spikes with 40% observed every 15 minutes.

Recommendations & Jiras

  • FAIL was under investigation by firebird team before with Data Export fails/stops triggered by a large file (> 196K) with MDEXP-658 - Getting issue details... STATUS MDEXP-587 - Getting issue details... STATUS  
  • The problem with a long query (414 Error) in 500k with srs - holdings and items job should be resolved by  MDEXP-607 - Getting issue details... STATUS

Test Results

This table contains durations for jobs with 2 job profiles. 

ProfileCSV  FileDE Duration/Status Poppy 1 setDE Duration/Status Poppy 2 set
ResultStatusResultStatus
DE MARC Bib (Default instances export job profile)1kDE.csv00:00:08COMPLETED00:00:23COMPLETED
100kDE.csv00:15:36COMPLETED00:15:23COMPLETED
500kDE.csv00:57:25FAIL

DE MARC Bib (srs - holdings and items)1kDE.csv00:00:29COMPLETED00:00:38COMPLETED
100kDE.csv00:47:23COMPLETED00:52:57COMPLETED
500kDE.csv04:11:09FAIL

Comparison

This table contains durations comparison between Orchid and Poppy releases

ProfileCSV  FileDE Duration/Status OrchidDE Duration/Status Poppy 1 setDE Duration, DELTA Orchid/Poppy 1 setDE Duration/Status Poppy 2 setDE Duration, DELTA Orchid/Poppy 2 set
ResultStatusResultStatushh:mm:ssResultStatushh:mm:ss
DE MARC Bib (Default instances export job profile)1kDE.csv

00:00:08COMPLETED
00:00:23COMPLETED
100kDE.csv

00:15:36COMPLETED
00:15:23COMPLETED
500kDE.csv

00:57:25FAIL



DE MARC Bib (srs - holdings and items)1kDE.csv00:00:27COMPLETED00:00:29COMPLETED+ 00:00:0200:00:38COMPLETED+ 00:00:11
100kDE.csv00:47:51COMPLETED00:47:23COMPLETED- 00:00:2800:52:57COMPLETED+ 00:05:06
500kDE.csv04:00:26COMPLETED04:11:09FAIL+ 00:10:43


Instance CPU Utilization

Service CPU Utilization

Memory Utilization

This graph shows that mod-data-export doesn't exceed 102% during test with 100k and at the end of the test it grew up to 112%. Memory consumption didn't grow with 500k file.

DB CPU Utilization

During tests spikes were observed on DB every 15 minutes. Average CPU Utilization equal to 18% .


DB Connections

Average DB connection is 200.



DB Load

SQL queries

Top-SQL statement: 

inventory - go to tables loans - check are there any

autovacuum: VACUUM fs09000000_mod_inventory_storage.instance (to prevent wraparound)

WITH deleted_rows AS ( delete from marc_indexers mi where exists( select ? from marc_records_tracking mrt where mrt.is_dirty = ? and mrt.marc_id = mi.marc_id and mrt.version > mi.version ) returning mi.marc_id), deleted_rows2 AS ( delete from marc_indexers mi where exists( select ? from records_lb where records_lb.id = mi.marc_id and records_lb.state = ? ) returning mi.marc_id) INSERT IN

SELECT fs09000000_mod_inventory_storage.count_estimate(?)

with "cte" as (select count(*) from "records_lb" where ("records_lb"."external_id" in (cast($1 as uuid), cast($2 as uuid), cast($3 as uuid), cast($4 as uuid), cast($5 as uuid), cast($6 as uuid), cast($7 as uuid), cast($8 as uuid), cast($9 as uuid), cast($10 as uuid), cast($11 as uuid), cast($12 as uuid), cast($13 as uuid), cast($14 as uuid), cast($15 as uuid), cast($16 as uuid), cast($17 as uuid), cast($18 as uuid), cast($19 as uuid), cast($20 as uuid), cast($21 as uuid), cast($22 as uuid), cast

Errors / Additional information

  • During test with DE MARC Bib (Default instances export job profile) in UI we see this message:
    • 2023-12-01T10:05:09.426+00:00 ERROR Export is completed with errors: some records have failed to export: number of failed records: 6
  • During test with DE MARC Bib (srs - holdings and items) in UI we see this message:
    • 2023-12-01T17:38:18.129+00:00 ERROR Error while getting items by holding ids Exception while calling...
      message: Get invalid response with status: 414

Methodology/Approach

3 files were prepared with query: SELECT id FROM [tenant_id]_mod_inventory_storage.instance where jsonb->>'source'='MARC' LIMIT 1000|100000|500000;

All tests were carried out sequentially with each job profile. 

To get status and time range for export jobs the query used: 

SELECT jsonb->>'status',jsonb->>'startedDate' AS startedDate,jsonb->>'completedDate' AS completedDate
FROM [REPLACE_tenant_id_HERE]_mod_data_export.job_executions
WHERE jsonb->>'jobProfileName'='[REPLACE_WITH_DE_JOB_HERE]'
ORDER BY jsonb->>'startedDate' desc LIMIT 10;

Infrastructure

PTF -environment pcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

Table contains modules, memory and CPU parameters

ModuleTask Def. RevisionModule VersionTask CountMem Hard LimitMem Soft limitCPU unitsXmxMetaspaceSizeMaxMetaspaceSizeR/W split enabled
pcp1-pvt
Wed Nov 22 08:06:06 UTC 2023
mod-data-export114.8.111024896102476888128FALSE
mod-authtoken132.14.121440115251292288128FALSE
mod-users-bl97.6.021440115251292288128FALSE
mod-inventory-storage1227.0.324096369020483076384512FALSE
mod-inventory1120.1.322880259210241814384512FALSE
mod-source-record-storage155.7.325600500020483500384512FALSE
mod-source-record-manager143.7.425600500020483500384512FALSE
nginx-okapi92023.06.1421024896128000FALSE
okapi-b115.1.23168414401024922384512FALSE