EHoldings export report [Morning Glory]



Overview

Per PERF-267, test Eholdings exports (PERF-273) of 10K  records to understand the workflow behavior before and when the mod-data-export-worker task crashes, if it crashes at all. 

  • How long does it take to export 10K records?
  •  What happens to the job that is running, will it be able to resume and complete successfully when the new task is spun up? 
  • Look for a memory trend and use it to decide on the number of concurrent jobs needed to reach the tipping point.  


Infrastructure

  • 10 m6i.2xlarge EC2 instances  (changed. In Lotus it was m5.xlarge)
  • 2 instances of db.r6.xlarge database instances, one reader and one writer
  • MSK
    • 4 m5.2xlarge brokers in 2 zones
    • auto.create-topics.enable = true
    • log.retention.minutes=120
    • 2 partitions per DI topics


Software Versions

  • mod-data-export-worker v 1.4.1 
  • mod-data-export-spring v 1.4.1
  • mod-agreements:5.2.0
  • mod-notes:3.1.0


Results

Summary 

  • This is initial test report for exporting Eholdings functionality. 
  • Approximately 10K records can be exported in 30 minutes (we did test it with 9 631 titles and it takes 27 minutes package eholdings/packages/53-1094073).
  • 2K export was completed in ±4 minutes;
  • System is unstable and often fail during procedure with symptoms MODEXPW-170
    • Job is completed and file download link is active (like in job#000034 on the screen below) also you can download exported file with whole amount of data.
    • Start time and End time becomes equal;
    • In DB job status still in progress.
    • This status never changes and block other jobs after it (they getting status "scheduled") until you'll restart mod-data-export-worker, mod-data-export-spring and the explicitly change job status in DB to "FAILED".
  • Memory trend: Memory is growing - however when container (mod-data-export-worker) starts the memory usage is on 15% rate. and after finishing 10K export it's on 29%. It's hard to determine memory trends while jobs usually stuck and you have to restart container to proceed. 
  • Failover  The mod-data-export-worker's container was restarted to simulate a crash. The ongoing job got stuck with the result described above (link to download is active, but job is forever in progress).





On this screenshot three attempts of exporting. 



Notable observations

  • There is no way to track exporting progress. Like how much records is transferred at this time? 
  • there is no way to check how big file is exported. There is only jobId and Job time containing useful info. 

  • UI does not updates by itself. only after page restart Which means additional calls on back end and more resource usage