Data Import in Central and Member Tenants

Overview

This document contains the results of testing Data Import in central and member tenants with ECS and DI file splitting feature enabled in the Poppy release.

Tickets: PERF-753 - Getting issue details... STATUS and PERF-754 - Getting issue details... STATUS  

Summary

  • DI create and update jobs have a better performance compared with test results without ECS enabled on Poppy release. More details in comparison table.
  • DI Create job results in central tenant and member tenant do not differ much for  from 1k to 50k. DI with 100k records file worked faster in central tenant.
  • DI Update jobs perform faster in central tenant and for 100k records file it was 5 minutes faster than in member tenant.
  • Service CPU utilization of mod-inventory in central tenant didn't exceed 130% during DI Update jobs. In member tenant it was 160%.
  • CPU utilization decreased for major of modules in both central and member tenants if to compare with non-ECS results. The only module which shows insignificant growth is mod-inventory-storage (+6%).
  • After an additional set of 100k DI create jobs on pcon (to populate DB with instances on member tenants) it was observed a growth trend of CPU utilization with subsequent Out Of Memory issue for mod-inventory after spike 361%.
  • Memory consumption for mod-inventory was 100% in average for both tenants.
  • Memory consumption increased for mod-source-record-storage (20%) and mod-source-record-manager (10%) in both central and member tenants if to compare with non-ECS results. It decreased for data-import module (15%).
  • RDS utilized 98% with DI MARC Bib Create jobs and 94% with DI MARC Bib Update jobs in central tenant and 95% with DI MARC Bib Create jobs and 90% with DI MARC Bib Update jobs in member tenant.
  • OpenSearch Service CPU utilization - 97% and Memory consumption - 99%.

Recommendations and Jiras

  • After an additional set of 100k DI create jobs on pcon (to populate DB with instances on member tenants) it was observed a growth trend of CPU utilization with subsequent Out Of Memory issue for mod-inventory after spike 361%.
  • Jira ticket was created PERF-764 - Getting issue details... STATUS and investigated. Results of heap dump analysis attached as a comment.
  • Consider more CPU units allocation to mode-data-import taking into account that during DI Update jobs CPU utilization for mod-data-import module exceed 100%.

Test Runs 

Test #

Scenario

Load levelComment
1DI MARC Bib Create1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause)Central tenant only
2DI MARC Bib Update1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause)
3DI MARC Bib Create1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause)Member tenant only
4DI MARC Bib Update1K, 5K, 10K, 25K, 50K, 100K consecutively (with 5 min pause)

Test Results

ProfileMARC FileDI Duration (hh:mm:ss)
Central tenantMember tenant
DI MARC Bib Create (PTF - Create 2)1K.mrc00:00:3800:00:33
5K.mrc00:02:1300:02:02
10K.mrc00:03:5400:03:54
25K.mrc00:09:4400:10:03
50K.mrc00:18:4900:18:50
100K.mrc00:37:4600:39:33
DI MARC Bib Update (PTF - Updates Success - 1)1K.mrc00:00:4400:00:33
5K.mrc00:02:2600:02:39
10K.mrc00:04:5700:05:20
25K.mrc00:12:0500:13:21
50K.mrc00:24:2700:26:43
100K.mrc00:49:1500:54:29

Comparison Table

ProfileMARC FileDI Duration (hh:mm:ss)
pconpcp1pcon/pcp1
Central tenantMember tenantCentral tenantDelta, central tenant
DI MARC Bib Create (PTF - Create 2)1K.mrc00:00:3800:00:3300:00:3900:00:01
5K.mrc00:02:1300:02:0200:02:3900:00:26
10K.mrc00:03:5400:03:5400:05:0000:01:06
25K.mrc00:09:4400:10:0300:11:1500:01:31
50K.mrc00:18:4900:18:5000:22:1600:03:27
100K.mrc00:37:4600:39:3300:49:5800:12:12
DI MARC Bib Update (PTF - Updates Success - 1)1K.mrc00:00:4400:00:3300:00:3400:00:10
5K.mrc00:02:2600:02:3900:02:2800:00:02
10K.mrc00:04:5700:05:2000:05:3100:00:34
25K.mrc00:12:0500:13:2100:14:5000:02:45
50K.mrc00:24:2700:26:4300:32:5300:08:26
100K.mrc00:49:1500:54:2901:14:3900:25:24

* - the results of DI without Check-in/Check-out in Poppy release were taken from the report Data Import with Check-ins Check-outs (Poppy)

Service CPU Utilization

CPU utilization decreased for major of modules in both central and member tenants if to compare with non-ECS tests results. The only module which shows insignificant growth is mod-inventory-storage (+ 6%).

ModuleCentral tenantMember tenant
Create JobsUpdate JobsCreate JobsUpdate Jobs
ECSNon-ECSECSNon-ECSECSNon-ECSECSNon-ECS
mod-inventory-b101%125%132%220%112%125%158%220%
mod-inventory-storage-b31%25%36%25%36%25%32%25%
mod-source-record-storage-b52%60%36%50%48%60%33%50%
mod-source-record-manager-b29%35%22%45%29%35%20%45%
mod-di-converter-storage-b68%80%68%90%56%80%52%90%
mod-data-import135%200%254%96% 25k file179%200%161%96% 25k file

This table provides Average CPU utilization in ECS and Non-ECS test results in 100k records file.

Central tenant

During create jobs the highest cpu utilization was with 100k record file by mod-inventory - 99%. During update jobs mod-inventory module utilized 130% with 100k record file. The spikes were observed in mod-data-import at the very beginning of each job that was expected. The highest spike was in update job with 100k records file - 250%.


Create jobs: 

The highest 101% of resource utilization was observed for mod-inventory. And at the end of the test we see that mod-quick-marc-b (98%) module begin to consume more than than other modules. Such behaviour was observed for all create jobs.

Average for mod-inventory-b - 101%, mod-inventory-storage-b - 31%, mod-source-record-storage-b - 52%, mod-source-record-manager-b - 29%, mod-di-converter-storage-b - 68%, , mod-data-import - 135% spike for 100k job.

Non-ECS tests*: Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job and mod-data-import - 86% spike for 25k job.

Update jobs:

The highest 134% of resource utilization was observed for mod-inventory.

Average for mod-inventory-b - 132%, mod-inventory-storage-b - 36%, mod-source-record-storage-b - 36%, mod-source-record-manager-b - 22%, mod-di-converter-storage-b - 68%, , mod-data-import - 254% spike for 100k job and mod-data-import - 85% spike for 25k job.

Non-ECS tests results*: Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.

Member tenant

During create jobs the highest cpu utilization was with 100k record file by mod-inventory - 117%. During update jobs mod-inventory module utilized 160% with 100k record file. The spikes were observed in mod-data-import at the very beginning of each job that was expected.

Create jobs: 

The highest 112% of resource utilization was observed for mod-inventory. And at the end of the test we see that mod-quick-marc-b(122%) module begin to consume more than than other modules. Such behaviour was observed for all create jobs.

Average for mod-inventory-b - 112%, mod-inventory-storage-b - 36%, mod-source-record-storage-b - 48%, mod-source-record-manager-b - 29%, mod-di-converter-storage-b - 56%, , mod-data-import - 179% spike for 100k job and mod-data-import - 70% spike for 25k job.

on-ECS tests results*: Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job.

Update jobs:

The highest 158% of resource utilization was observed for mod-inventory.

Average for mod-inventory-b - 158%, mod-inventory-storage-b - 32%, mod-source-record-storage-b - 33%, mod-source-record-manager-b - 20%, mod-di-converter-storage-b - 52%, , mod-data-import - 161% spike for 100k job and mod-data-import - 73% spike for 25k job.

Non-ECS tests results*: Average for mod-inventory-b - 220%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 50%, mod-source-record-manager-b - 45%, mod-di-converter-storage-b - 90%, , mod-data-import - 96% spike for 25k job.

Memory Utilization

Memory consumption increased for mod-source-record-storage (20%) and mod-source-record-manager (10%) in both central and member tenants if to compare with non-ECS tests results. It decreased for data-import module (15%).

This table provides Average Memory Consumption in ECS and Non-ECS test results. Mod-di-converter-storage-b in the Non-ECS tests scenario is not provided, as indicated by the "-".

ModuleECSNon-ECS
Central tenantMember tenant
Create JobsUpdate JobsCreate JobsUpdate Jobs
mod-inventory-b96%96%101%101%90%
mod-inventory-storage-b18%21%20%23%18%
mod-source-record-storage-b64%74%71%71%46%
mod-source-record-manager-b49%49%46%43%38%
mod-di-converter-storage-b32%33%33%33%-
mod-data-import38%38%35%37%53%

Central tenant

Memory consumption for module mod-inventory grew gradually to 96 % during create jobs and didn't change during update jobs. No memory leaks were detected.