[FOLIO-2264] mod-agreements crashes, out of memory Created: 17/Sep/19  Updated: 03/Jun/20  Resolved: 20/Sep/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: TBD
Reporter: Ian Hardy Assignee: Ian Hardy
Resolution: Done Votes: 0
Labels: ci, devops, platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File agreements-inspect.json    
Sprint: CP: sprint 72
Story Points: 2
Development Team: Core: Platform

 Description   

mod-agreements crashes shorlty after it's enabled due to out of memory error. Make sure agreements is getting enough memory, and no bumping into the container's memory limit.



 Comments   
Comment by Ian Hardy [ 18/Sep/19 ]

After a conversation w steve.osguthorpe this could be caused by bumping into the system wide memory limit during tenant init. Steve will try limiting memory usage in mod-agreements as a test.

Comment by Ian Ibbotson (Use this one) [ 18/Sep/19 ]

Outstanding questions:

  • 8 containers were killed - were they all agreements? If so, how were they being [re]started
  • The version of mod-agreements
  • Number of processor cores
  • Total system ram for the machine (not the container)
  • Container process startup params (i.e. number of cpus,memory etc) if limited
  • JVM startup parameters (xmx and xms especially)
    Can you also let me know if you are running the full suite of folio modules (i.e. all ~50 containers)?
    Does the container work as expected if restarted without any other change?
Comment by Eric Valuk [ 18/Sep/19 ]

To answer a lot of questions we use AWS ECS to maintain a desired number of containers. This desired value is 1 for mod-agreements. All of the containers that I scanned through were mod-agreements containers that had last logged entries during the time of our issues. The module was failing health checks and attempted to restart several times. That is my interpretation of the 8 different containers .

We identified that mod-agreements requires 1028 soft limit and 1280 hard limit to run without issue. We implemented these values recently after our issues yesterday morning so we are still tracking the behavior.

mod-agreements-1.9.0
m5.large 2 vCpu and 8GiB memory
128 cpu units allocated to mod-agreements, 512 soft and not hard limited at time of the failure.
we do not add any JVM args to the modules currently

we do not have all 50 modules running on this instance but they are running in the larger deployment. There are approximately 15 containers on the instance including mod-agreements.

Hopefully that answers your questions, let me know if you need more info

Comment by steve.osguthorpe [ 18/Sep/19 ]

Hi Eric Valuk Thanks for this. And I think we have got to the bottom of this and it's to do with metaspace usage by the JVM. This is memory that java processes will use directly and is not part of the heap space allocated by the 'Xm(s/x)' parameters, and is instead written directly to the system memory. By default this is "unbounded" unless specified explicitly when starting the JVM process. I am going to do some more testing to come up with sensible values for mod-agreements, but this could actually affect any module not explicitly limiting the metaspace for the JVM.

For now I have reduced thread counts in mod-agreements and also specified some hard limits for buffers, as well as moving the buffers out of metaspace and into heap. We have also limited the JVM on the index-data controlled reference environments to 256M of ram and this has allowed things to operate in the default ~700M container size. These changes obviously have a detrimental effect on the app though in terms of performance so I will be changing these to be more introspective of the resourcing available.

Comment by Ian Hardy [ 18/Sep/19 ]

In our environments (32 gb ram, 8 cpu running all 50 mods. Head size set to 384 for mod-agreements) we were seeing just mod-agreements get killed when a host memory limit for the container was set. Steve's made some changes to the latest snapshot of mod-agreements to set limits which in combination with lowering the -Xmx value to 256 in folio-ansible seem to give agreements enough headspace for now.

Comment by Ian Hardy [ 19/Sep/19 ]

Also effects mod-licenses. Steve's recommended new defaults for the JAVA_OPTIONS. Will verify builds tomorrow AM.

Generated at Thu Feb 08 23:19:24 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.