Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinetrue

Overview

  1. These tests are run to investigate the performance difference of mod-data-export with logLevel=info vs logLevel=warn for Juniper release. This testing was part of
    Jira Legacy
    serverSystem JiraJIRA
    columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId01505d01-b853-3c2e-90f1-ee9b165564fc
    keyMDEXP-394
    where we observed that mod-data-export is continuously writing a lot of data to log. For 100K DE job, mod-data-export is writing 42Million records. This could also result in a crash if there is not enough CPU, memory allocated to mod-data-export.

Image Removed

      2. Check how many jobs can run in parallel -

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId01505d01-b853-3c2e-90f1-ee9b165564fc
keyPERF-142
so data-export does not break

  1. In mod-data-export log4j2.properties file was modified with rootLogger.level = warn, logger.netty.level = warn, status = warn


Backend:

  • mod-data-export-4.1.1 (snapshot version)
  • mod-source-record-storage-5.1.4
  • mod-source-record-manager-3.1.3
  • okapi-4.8.2
  • mod-authtoken-2.8.0

...

  • 8 million inventory records
  • 74 FOLIO back-end modules deployed in 144 ECS services
  • 3 okapi ECS services
  • 12 m5.large  EC2 instances
  • writer db.r6g.xlarge 1 reader db.r6g.xlarge AWS RDS instance
  • INFO logging level / WARN looging levellogging level
  • mod-data-export Soft memory limit - 360 MB Hard memory limit - 512 MB
  • CPU - 128

High-Level Summary

  1. With WARN level logging, seeing 9% improvement in memory utilization
  2. data-export/job-executions API is polled every 3 seconds vs 5 seconds in Goldenrod. This means there are more requests overall but it also improves data-export execution. Data-export job completes faster(around 2 seconds) compared to the previous release. Other than that, data-export is stable and see performance improvements compared to the Goldenrod release.
  3. A new feature added for data-export is it will kill failed job every 6 hours
  4. How many data-export jobs can run in parallel? Multiple jobs can run in parallel and 2.75 Million instance records can be exported at any given point in time. If we try to export more than that for example 3 Million instance records, mod-data-export crashes with OOM(Out Of Memory)utilization
  5. With WARN level logging, 70% improvement in CPU utilization
  6. With WARN level logging, No bumps in the memory were observed. Memory utilization stays stable for multiple Data Export job runs.
  7. Improvement by 42Million records for WARN. 42 Million fewer records were written
  8. With WARN level logging, there is improvement in JVM's Garbage collection

Test Runs

1 user - INFO level logging vs WARN level logging

Test

Total instances

Duration

mod-data-export log level

Duration (Total time to complete exporting all instances, holdings, and items)Total records logged in CloudWatchDE Job profile
1.100,000
30 minINFO2.97 minutes
INFO1 hour 6 minutes42.5 MillionHoldings and item
2.100,000
30 min
WARN
18
57 minutes

 Slow API taking more than 100 ms to return for 1 user:

...

Fast queries involved during data-export job:

...

CPU Utilization

CPU utilization increase as the number of instances increases from 100 to 500k. CPU reaches maximum for 100k, 500k instances but there is not much difference for 1 user and 2 users.

Image Removed

Service Memory Utilization

Memory increases gradually as we start running tests especially for mod-data-export and then stabilizes. For all other modules such as mod-source-record-storage, the okapi remains constant between 80% - 100%.

Image Removed

Check how many jobs can run in parallel

Multiple jobs can run in parallel but data-export fails if trying to export 3 Million instance records with the below configuration.

Current memory allocation to mod-data-export service in ECS task definition container:
Soft memory limit - 360 MB
Hard memory limit - 512 MB

Memory Utilization gradually increases from 101% to 141% as we increase the number of instance records where it eventually crashes.

...

When trying to export 3M records, POST data-export/file-definitions/d63d8a83-e339-44b2-8a2f-41caaf080221/upload fails with 503

Appendix

...

120KHoldings and item


Total records logged for 100K, INFO vs WARN

INFO - ~42.5 Million records

Image Added

WARN - ~120K records

Image Added


Service Memory Utilization

For mod-data-export, 9% improvement in memory for WARN level logging

Memory Utilization INFO vs WARN

Image Added

CPU Utilization

For mod-data-export, 70% improvement in CPU utilization for WARN level logging

INFO level logging

Image Added

WARN level logging

Image Added


Heap dump analysis using JXRay

4.6% improvement in Garbage collection with WARN level logging. That means, mod-data-export is generating less Unreachable garbage objects. WARN level logging is effectively utilization 12 MB of JVM memory. While with INFO level logging, this 12 MB is wasted. This is only for 100K run. As we run the Data Export job continuously, this 12MB could be much larger.

INFO level:

Image Added


WARN level:

Image Added


Other than this, with WARN level logging, mod-data-export will create fewer Object instances in JVM. For detailed comparison see mod-data-export-with-info-logging.html vs mod-data-export-with-warn-logging.html


Investigate GC with default info level logging

How much metaspace size should we allocate to mod-data-export?

Around 20%-25% of Xmx. Currently, mod-data-export is running with Xmx=388Mb and XX:MetaspaceSize=88Mb which is 22%. 


We can check metaspace capacity with jstat

jstat -gcmetacapacity (PID)

For example:

root@2335b20abe22:/usr/ms# jstat -gcmetacapacity 60

   MCMN       MCMX        MC       CCSMN      CCSMX       CCSC     YGC   FGC    FGCT    CGC    CGCT     GCT   

       0.0   143360.0    70272.0        0.0    81920.0     9344.0 20690   451   43.211     -        -  143.184

MC=70272Kb means currently, mod-data-export is using ~70MB of metspacesize


Checked how much percent of metaspacesize is used by mod-data-export

root@2335b20abe22:/usr/ms# jstat -gcutil 60 | awk '{print($5)}'

97.00

M=97.00 means 97% of metaspace size is used. Metaspace size allocated soft and hard is 88MB.


Investigated GC algorithm by using ShenandoahGC. But found out that JVM with OpenJDK package does not support it. By default it uses G1

Investigated another GC algorithm by using ZGC. After applying in the task definition, mod-data-export did become stable but it crashes after updating job profile in UI.

References

https://jxray.com/

http://www.mastertheboss.com/java/solving-java-lang-outofmemoryerror-metaspace-error/

https://wiki.openjdk.java.net/display/zgc/Main#Main-JDK12

https://hub.packtpub.com/getting-started-with-z-garbage-collectorzgc-in-java-11-tutorial/