[FOLIO-2264] mod-agreements crashes, out of memory Created: 17/Sep/19 Updated: 03/Jun/20 Resolved: 20/Sep/19 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | TBD |
| Reporter: | Ian Hardy | Assignee: | Ian Hardy |
| Resolution: | Done | Votes: | 0 |
| Labels: | ci, devops, platform-backlog | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
| Sprint: | CP: sprint 72 |
| Story Points: | 2 |
| Development Team: | Core: Platform |
| Description |
|
mod-agreements crashes shorlty after it's enabled due to out of memory error. Make sure agreements is getting enough memory, and no bumping into the container's memory limit. |
| Comments |
| Comment by Ian Hardy [ 18/Sep/19 ] |
|
After a conversation w steve.osguthorpe this could be caused by bumping into the system wide memory limit during tenant init. Steve will try limiting memory usage in mod-agreements as a test. |
| Comment by Ian Ibbotson (Use this one) [ 18/Sep/19 ] |
|
Outstanding questions:
|
| Comment by Eric Valuk [ 18/Sep/19 ] |
|
To answer a lot of questions we use AWS ECS to maintain a desired number of containers. This desired value is 1 for mod-agreements. All of the containers that I scanned through were mod-agreements containers that had last logged entries during the time of our issues. The module was failing health checks and attempted to restart several times. That is my interpretation of the 8 different containers . We identified that mod-agreements requires 1028 soft limit and 1280 hard limit to run without issue. We implemented these values recently after our issues yesterday morning so we are still tracking the behavior. mod-agreements-1.9.0 we do not have all 50 modules running on this instance but they are running in the larger deployment. There are approximately 15 containers on the instance including mod-agreements. Hopefully that answers your questions, let me know if you need more info |
| Comment by steve.osguthorpe [ 18/Sep/19 ] |
|
Hi Eric Valuk Thanks for this. And I think we have got to the bottom of this and it's to do with metaspace usage by the JVM. This is memory that java processes will use directly and is not part of the heap space allocated by the 'Xm(s/x)' parameters, and is instead written directly to the system memory. By default this is "unbounded" unless specified explicitly when starting the JVM process. I am going to do some more testing to come up with sensible values for mod-agreements, but this could actually affect any module not explicitly limiting the metaspace for the JVM. For now I have reduced thread counts in mod-agreements and also specified some hard limits for buffers, as well as moving the buffers out of metaspace and into heap. We have also limited the JVM on the index-data controlled reference environments to 256M of ram and this has allowed things to operate in the default ~700M container size. These changes obviously have a detrimental effect on the app though in terms of performance so I will be changing these to be more introspective of the resourcing available. |
| Comment by Ian Hardy [ 18/Sep/19 ] |
|
In our environments (32 gb ram, 8 cpu running all 50 mods. Head size set to 384 for mod-agreements) we were seeing just mod-agreements get killed when a host memory limit for the container was set. Steve's made some changes to the latest snapshot of mod-agreements to set limits which in combination with lowering the -Xmx value to 256 in folio-ansible seem to give agreements enough headspace for now. |
| Comment by Ian Hardy [ 19/Sep/19 ] |
|
Also effects mod-licenses. Steve's recommended new defaults for the JAVA_OPTIONS. Will verify builds tomorrow AM. |