Data Import test report (Poppy)


Overview

This document contains the results of testing Data Import for MARC Bibliographic records at Poppy release. PERF-712 - Getting issue details... STATUS  

Summary

  • Duration for DI increases correlates to the number of the records imported. 
  • The increase in memory utilization was due to the scheduled cluster shutdown. No memory leak is suspected for DI modules.
  • Average CPU utilization of modules for all Create and Update jobs did not exceed 150 %. Spikes at the beginning because of mod-data-import module are expected.
  • Approximate DB CPU usage is close to 95% and this numbers goes for all jobs with files more than 10k records. 
  • Poppy release has higher average CPU resource utilization of DI related services comparing with Orchid. Specially mod-inventory. 
  • Errors occurred for DI create job with 100k file and for DI update jobs with 25k file because of timeout issue (Opening SQLConnection failed: Timeout)
  • There was no observed reasonable difference in Data Import duration for tests with 1 and 5 ASYNC_PROCESSOR_MAX_WORKERS_COUNT

Recommendations and Jiras

  1. Investigate Timeout issues. Ticket created MODINV-924 - Getting issue details... STATUS
  2. Check memory trends for mod-source-record-storage-b and mod-inventory during additional DI tests without cluster night shut down
  3. Increase CPU units allocation for mod-inventory, mod-di-converter-storage, mod-quick-marc services
  4. Use higher DB instance type (scale up from db.r6g.xlarge to db.r6g.2xlarge). 

Results

Test #

Profile

Duration

Orchid with R/W split enabled (07/09/2023)

Duration

Poppy 

Difference, % / secResults
1

1k MARC BIB Create

PTF - Create 2
39 sec
Completed
2

2k MARC BIB Create

PTF - Create 2
1 min 01 sec
Completed
35k MARC BIB CreatePTF - Create 22 min 23 sec2 min 22 sec0.88% / 1 secCompleted
410k MARC BIB CreatePTF - Create 25 min 12 sec4 min 29 sec18.86% / 43 secCompleted
525k MARC BIB CreatePTF - Create 211 min 45 sec10 min 38 sec11.38% / 67 secCompleted 
650k MARC BIB CreatePTF - Create 223 min 36 sec20 min 26 sec15.18% / 190 secCompleted 

7

100k MARC BIB CreatePTF - Create 249 min 28 sec2 hours 46 min
Cancelled (stopped by user) *
81k MARC BIB UpdatePTF - Updates Success - 1
34 sec
Completed
92k MARC BIB UpdatePTF - Updates Success - 1
1 min 09 sec
Completed
105k MARC BIB UpdatePTF - Updates Success - 12 min 48 sec2 min 31 sec6.66% / 17 secCompleted
1110k MARC BIB UpdatePTF - Updates Success - 15 min 23 sec5 min 13 sec1.84% / 10 secCompleted
1225k MARC BIB UpdatePTF - Updates Success - 114 min 12 sec12 min 27 sec14% / 105 secCompleted with errors *
1325k MARC BIB UpdatePTF - Updates Success - 1
2 min 15 sec
Completed with errors *
1425k MARC BIB UpdatePTF - Updates Success - 1
12 min
Cancelled (stopped by user) *

 * - for all jobs completed with errors or cancelled there was the same issue in UI: io.vertx.core.impl.NoStackTraceThrowable: Timeout

Test #14 was stopped manually from UI. 2 tests with 25k MARC BIB Update were carried out to confirm that 25k doesn't work properly and has the same issue. 


Memory Utilization

The increase in memory utilization was due to the scheduled cluster shutdown. no memory leak is suspected for DI modules.

MARC BIB CREATE

Tests #1-7

1k, 2k, 5k, 10k, 25k, 50k, 100k records

MARC BIB UPDATE

Tests #8-14

1k, 2k, 5k, 10k, 25k, 25k, 25k records

Service CPU Utilization 

MARC BIB CREATE

Tests #1-7

1k, 2k, 5k, 10k, 25k, 50k, 100k records

CPU utilization for all modules came back to by default numbers after all tests. The highest 170% of resource utilization was observed for mod-quick-marc-b module in 5k DI create job.

Average for mod-inventory-b - 125%, mod-inventory-storage-b - 25%, mod-source-record-storage-b - 60%, mod-source-record-manager-b - 35%, mod-di-converter-storage-b - 80%, , mod-data-import - 200% spike for 100k job.


MARC BIB UPDATE

Tests #8-14

1k, 2k, 5k, 10k, 25k, 25k, 25k records