Overview

Per PERF-267, test Eholdings exports (PERF-273) of 10K records to understand the workflow behavior before and when the mod-data-export-worker task crashes, if it crashes at all.

How long does it take to export 10K records?
What happens to the job that is running, will it be able to resume and complete successfully when the new task is spun up?
Look for a memory trend and use it to decide on the number of concurrent jobs needed to reach the tipping point.

Infrastructure

10 m6i.2xlarge EC2 instances (changed. In Lotus it was m5.xlarge)
2 instances of db.r6.xlarge database instances, one reader and one writer
MSK
- 4 m5.2xlarge brokers in 2 zones
- auto.create-topics.enable = true
- log.retention.minutes=120
- 2 partitions per DI topics

Software Versions

mod-data-export-worker v 1.4.1
mod-data-export-spring v 1.4.1
mod-agreements:5.2.0
mod-notes:3.1.0

Results

Summary

This is initial test report for exporting Eholdings functionality.
Approximately 10K records can be exported in 30 minutes (we did test it with 9 631 titles and it takes 27 minutes package eholdings/packages/53-1094073).
2K export was completed in ±4 minutes;
System is unstable and often fail during procedure with symptoms MODEXPW-170
- Job is completed and file download link is active (like in job#000034 on the screen below) also you can download exported file with whole amount of data.
- Start time and End time becomes equal;
- In DB job status still in progress.
- This status never changes and block other jobs after it (they getting status "scheduled") until you'll restart mod-data-export-worker, mod-data-export-spring and the explicitly change job status in DB to "FAILED".
Memory trend: Memory is growing - however when container (mod-data-export-worker) starts the memory usage is on 15% rate. and after finishing 10K export it's on 29%. It's hard to determine memory trends while jobs usually stuck and you have to restart container to proceed.
Failover The mod-data-export-worker's container was restarted to simulate a crash. The ongoing job got stuck with the result described above (link to download is active, but job is forever in progress).

On this screenshot three attempts of exporting.

Notable observations

There is no way to track exporting progress. Like how much records is transferred at this time?
there is no way to check how big file is exported. There is only jobId and Job time containing useful info.

UI does not updates by itself. only after page restart Which means additional calls on back end and more resource usage