[FOLIO-2334] Spike: Investigate using JVM features to manage container memory Created: 31/Oct/19 Updated: 03/Jun/20 Resolved: 18/Nov/19 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P2 |
| Reporter: | David Crossley | Assignee: | David Crossley |
| Resolution: | Done | Votes: | 0 |
| Labels: | devops, platform-backlog | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | CP: sprint 76 | ||||||||||||||||||||||||||||||||||||
| Story Points: | 5 | ||||||||||||||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||||||||||||||
| Description |
|
To enable the JVM to use defaults in a container environment, Java 10 introduced "UseContainerSupport" and that was backported to Java 8 (8u191+). Use that in conjunction with "MaxRAMPercentage". The following steps to explore that: 1. Roll 2 new base FOLIO JVM images (folioci/openjdk8-jre-alpine and folioci/openjdk8-jre) Outcome:
|
| Comments |
| Comment by David Crossley [ 15/Nov/19 ] |
|
Results of spike investigation: Built new docker image folioci/alpine-jre-openjdk8:latest Verified with various modules (and local vagrant VM), rebuilding each module within it and launching their new local docker container. JAVA_OPTIONS: "-XX:MaxRAMPercentage=66.0 -XX:+PrintFlagsFinal" (Note that the value must be "66.0" not "66".) Verified that that sets "MaxHeapSize" to be 66% of the "Memory" value specified in the module default LaunchDescriptor. So some modules may need to reassess their LaunchDescriptor "Memory" setting. Also verified that existing LD settings will still operate as is now (probably better with this newer underlying docker image). Some notes for the roll-out process: Each module can utilise the new docker image, and adjust to the new LD settings. Then a follow-up job is to adjust their folio-ansible group_vars settings, which over-ride these. As shown above, the current group_vars settings will still operate until then. |
| Comment by David Crossley [ 15/Nov/19 ] |
|
Some useful resources discovered while investigating: https://medium.com/adorsys/usecontainersupport-to-the-rescue-e77d6cfea712 https://stackoverflow.com/a/55463537 |
| Comment by David Crossley [ 15/Nov/19 ] |
|
Still investigating some other FOLIO modules. |
| Comment by David Crossley [ 18/Nov/19 ] |
|
See the results of this spike listed above, and this ticket's modified issue Description to specify the Outcome and next steps. Please wait for
|
| Comment by David Crossley [ 29/Nov/19 ] |
|
steve.osguthorpe asked on
Further summary to accompany the results listed above: During testing we added "-XX:+PrintFlagsFinal" and investigated the docker logs. This shows that "MaxHeapSize" is correctly set to 66% of the container memory allocation. After rollout of the new base docker image and MaxRAMPercentage setting, we monitor the folio-snapshot-load reference environment each day. Every hour we assess 'docker stats' for all modules. This shows that their total memory usage is remaining below that level. For a longer-running system there would probably be an increase, as non-heap memory is futher utilised and returned. We also grep each module's docker logs every hour to ensure no "java.lang.OutOfMemoryError". So we believe that this usage of UseContainerSupport and MaxRAMPercentage does provide an appropriate way to manage the memory allocation. It reserves one-third of the container memory for non-heap use. Developers can adjust the total container memory via their LaunchDescriptor to raise it if they need more, but hopefully trim it down to provide a more lean system. The 66% reservation is an average estimate, and could also be adjusted for certain modules. (When the other devops people return from the Thanksgiving break, then they might be able to expand my answers.) |
| Comment by David Crossley [ 29/Nov/19 ] |
|
steve.osguthorpe also asked on
|
| Comment by John Malconian [ 02/Dec/19 ] |
|
steve.osguthorpe is most likely referring to this error that occurs in the container log when using the old fabric8-based container: cat: can't open '/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes': No such file or directory This specific error is generated when cgroup swap auditing is not enabled in the host kernel. This is default for recent Debian/Ubuntu kernels (it can be enabled via a kernel parameter). The old fabric8 base image had hooks for setting this control group, because it was originally written for Redhat/CentOS where cgroup swap auditing is enabled by default. At any rate, we are no longer using a base image based on fabric8 so this error should no longer appear. We do not set any container limitations on swap anyway - just RAM. Control group auditing for RAM is enforced. |
| Comment by steve.osguthorpe [ 03/Dec/19 ] |
|
David Crossley - Thanks for the confirmation. |
| Comment by Wayne Schneider [ 03/Dec/19 ] |
|
steve.osguthorpe there are two keys in the module descriptor that can communicate that kind of information to the external operators: launchDescriptor/dockerArgs/HostConfig/Memory: Total memory allocation for the container One rule of thumb proposed by Craig McNally, which seems sensible to me, is that you set the size for a single tenant with a standard workload, whatever that means to you, with the expectation that the operator will scale containers horizontally to meet higher demand. Like all rules of thumb, it probably won't work for every circumstance, but it seems a reasonable starting point. Beyond that, you can of course communicate specific resource needs in the module README. Does that address your concerns? |
| Comment by steve.osguthorpe [ 03/Dec/19 ] |
|
Wayne Schneider Thanks. That seems completely reasonable to me. |