Overview
Per PERF-267, test Eholdings exports (PERF-273) of 10K records to understand the workflow behavior before and when the mod-data-export-worker task crashes, if it crashes at all.
- How long does it take to export 10K records?
- What happens to the job that is running, will it be able to resume and complete successfully when the new task is spun up?
- Look for a memory trend and use it to decide on the number of concurrent jobs needed to reach the tipping point.
Infrastructure
- 10 m6i.2xlarge EC2 instances (changed. In Lotus it was m5.xlarge)
- 2 instances of db.r6.xlarge database instances, one reader and one writer
- MSK
- 4 m5.2xlarge brokers in 2 zones
- auto.create-topics.enable = true
- log.retention.minutes=120
- 2 partitions per DI topics
Software Versions
- mod-data-export-worker v 1.4.1
- mod-data-export-spring v 1.4.1
- mod-agreements:5.2.0
- mod-notes:3.1.0
Results
Summary
- This is initial test report for exporting Eholdings functionality.
- Approximately 10K records can be exported in 30 minutes (we did test it with 9 631 titles and it takes 27 minutes package eholdings/packages/53-1094073).
- 2K export was completed in ±4 minutes;
- System is unstable and often fail during procedure with symptoms MODEXPW-170
- Job is completed and file download link is active (like in job#000034 on the screen below) also you can download exported file with whole amount of data.
- Start time and End time becomes equal;
- In DB job status still in progress.
- This status never changes and block other jobs after it (they getting status "scheduled") until you'll restart mod-data-export-worker, mod-data-export-spring and the explicitly change job status in DB to "FAILED".
- Memory trend: Memory is growing - however when container (mod-data-export-worker) starts the memory usage is on 15% rate. and after finishing 10K export it's on 29%. It's hard to determine memory trends while jobs usually stuck and you have to restart container to proceed.
- Failover The mod-data-export-worker's container was restarted to simulate a crash. The ongoing job got stuck with the result described above (link to download is active, but job is forever in progress).
On this screenshot three attempts of exporting.
Notable observations
- There is no way to track exporting progress. Like how much records is transferred at this time?
- there is no way to check how big file is exported. There is only jobId and Job time containing useful info.
- UI does not updates by itself. only after page restart Which means additional calls on back end and more resource usage