Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

...

This document contains the results of testing Data Export (MARC BIB) on Quesnelia release with Data Export tests for 1k, 100k, 500k. Three csv files were prepared to run Data Export with Default instances export job profile and srs - holdings and items job profiles.

...

  • DE jobs perform dramatically better in Quesnelia release if to compare with Poppy. No issues with token. All jobs with file's volume 1000, 100k, 500k records completed successfully.
  • The improvement varies from file size or job profile - from 4 to 9 times better duration. Additional test was conducted with job profile prepared by script to check consistency of results.
  • The Average CPU utilization for mod-data-export depends on file size and job profile. Exporting 100k records- 63% in Default and 92% in custom job profile. Exporting 500k - Default instances export job profile- 434%, srs - holdings and items- 296%
  • Average Memory consumption for mod-data-export was close to 100%. Almost the same as in Poppy release.
  • Average DB utilization - 17% with 100k and 33% with 500k. DB connections - 1360 instead of 200 in Poppy. 

Recommendations & Jiras

  • Consider increasing of cpu for mod-data-export module to smoothen spikes with big size files.

Test Results

This table contains durations for jobs with 3 job profiles. 

...

ProfileCSV  FileDE Duration/Status Poppy 2 set DE Duration/Status Quesnelia DE Duration, DELTA Poppy/Quesnelia


ResultStatusResultStatushh:mm:ss
DE MARC Bib (Default instances export job profile)1kDE.csv00:00:08COMPLETED00:00:02COMPLETED00:00:06 - 4 times improvement

100kDE.csv00:15:36COMPLETED00:02:17COMPLETED00:13:19 - 7 times improvement

500kDE.csv00:57:25FAIL00:05:10COMPLETED
DE MARC Bib (srs - holdings and items)1kDE.csv00:00:29COMPLETED00:00:04COMPLETED00:00:25 - 7 times improvement

100kDE.csv00:47:23COMPLETED00:05:13COMPLETED00:42:10 - 9 times improvement

500kDE.csv04:11:09FAIL00:08:58COMPLETED

...


Service CPU Utilization

Expand
titleCPU utilization

Default instances export job profile with 500k file

ModuleCPU
mod-data-export-b434.91
mod-inventory-b10.79
mod-source-record-manager-b1.74
mod-source-record-storage-b1.5
okapi-b0.96
mod-users-bl-b0.63
mod-authtoken-b0.61
mod-inventory-storage-b0.39
nginx-okapi0.24
pub-okapi0.16


srs - holdings and items with 500k file

ModuleCPU
mod-data-export-b296.31
mod-inventory-b12.79
mod-source-record-manager-b1.85
mod-source-record-storage-b1.49
okapi-b1.1
mod-authtoken-b0.95
mod-users-bl-b0.69
mod-inventory-storage-b0.54
nginx-okapi0.32
pub-okapi0.21

...

TOP 20 modules

ModuleCPU
mod-data-export-b

For Default instances export job profile with 100k file used 92% , during exporting 500k file - 434%.

For job profile "Export for Data Import updates" (created by script) - exporting with 100k - 33%, 500k - 202%. 

Image Removed

For srs - holdings and items job profile - 100k - 63%, 500k - 296%.

Image Removed

Memory Utilization

Expand
titleMemory consumption
ModuleMemory

mod-data-export-b

97

mod-inventory-b

55

okapi-b

42

mod-source-record-manager-b

42

mod-users-bl-b

32

mod-source-record-storage-b

27

mod-authtoken-b

25

mod-inventory-storage-b

14

nginx-okapi

5

pub-okapi

4296.31
mod-consortia-b19.03
mod-inventory-b12.79
mod-dcb-b8.17
mod-quick-marc-b7.99
mod-pubsub-b7.05
mod-users-b6.85
mod-audit-b6.18
mod-kb-ebsco-java-b5.78
edge-dematic-b4.5
mod-erm-usage-harvester-b4.49
mod-organizations-storage-b4.04
mod-licenses-b3.51
mod-permissions-b3.06
mod-data-export-spring-b2.78
mod-lists-b2.77
mod-organizations-b2.74
mod-user-import-b2.7
mod-tags-b2.57
mod-sender-b2.5


Mod-data-export-b

For Default instances export job profile with 100k file used 92% , during exporting 500k file - 434%.

For job profile "Export for Data Import updates" (created by script) - exporting with 100k - 33%, 500k - 202%. 

Image Added

For srs - holdings and items job profile - 100k - 63%, 500k - 296%.

Image Added

Memory Utilization

Expand
titleMemory consumption


ModuleMemory

mod-data-export-b

97

mod-inventory-b

55

okapi-b

42

mod-source-record-manager-b

42

mod-users-bl-b

32

mod-source-record-storage-b

27

mod-authtoken-b

25

mod-inventory-storage-b

14

nginx-okapi

5

pub-okapi

4

TOP 20 modules srs - holdings and items with 500k file

ModuleMemory
mod-data-export-b97.51
mod-data-export-worker-b89.71
mod-dcb-b78.72
mod-consortia-b78.13
mod-oa-b76.73
mod-orders-b65.1
mod-copycat-b62.06
mod-calendar-b56.01
mod-agreements-b55.19
mod-invoice-b55.1
mod-permissions-b55.07
mod-circulation-item-b53.84
mod-erm-usage-harvester-b52.36
mod-service-interaction-b51.65
mod-notes-b50.5
mod-orders-storage-b50.49
mod-users-b49.88
mod-tags-b47.66
mod-inventory-b45.58
mod-audit-b44.97



This graph contains DE related modules.

...

3 files were prepared with query: SELECT id FROM [tenant_id]_mod_inventory_storage.instance where jsonb->>'source'='MARC' LIMIT 1000|100000|500000;

All tests were carried out sequentially with each job profile on main tenant fs09000000

To get status and time range for export jobs the query used: 

...

Code Block
languagesql
themeFadeToGrey
titleQueriesSQL Query
select exported as filesize,completed_date - started_date as duration,job_profile_name,status as status,started_date,completed_date


FROM [tenant]_mod_data_export.job_executions


where job_profile_name = 'srs - holdings and items'


ORDER BY completed_date DESC


select exported as filesize,completed_date - started_date as duration,job_profile_name,status as status,started_date,completed_date


FROM [tenant]_mod_data_export.job_executions


where job_profile_name = 'Default instances export job profile'


ORDER BY completed_date DESC


select exported as filesize,completed_date - started_date as duration,job_profile_name,status as status,started_date,completed_date


FROM [tenant]_mod_data_export.job_executions


where job_profile_name = 'Export for Data Import updates(test)1'


ORDER BY completed_date DESC

Infrastructure

PTF -environment qcp1

  • 10 m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, writer/reader

    NameMemory GIBvCPUsmax_connections

    db.r6g.xlarge

    32 GiB4 vCPUs2731

    Data set for fs09000000

    • Instances - 25606331
    • Items       - 26779913
    • Holdings - 25576735
  • MSK tenant
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3

...