Spike: Investigate using JVM features to manage container memory

Description

To enable the JVM to use defaults in a container environment, Java 10 introduced "UseContainerSupport" and that was backported to Java 8 (8u191+). Use that in conjunction with "MaxRAMPercentage".

The following steps to explore that:

1. Roll 2 new base FOLIO JVM images (folioci/openjdk8-jre-alpine and folioci/openjdk8-jre)
2. Update a Dockerfile in two test modules (which use each of those)
3. Test launching containers with legacy settings in the LaunchDescriptor
4. Test launching containers with new settings in the LaunchDescriptor

Outcome:

  • update documentation on dev.folio.org of what memory setting should be used in individual MDs and how the Dockerfile should be updated to ulitize the new base image that includes updated JVM that support those settings.

  • Update https://folio-org.atlassian.net/browse/FOLIO-2315#icft=FOLIO-2315 and linked issues to link to this documentation

Environment

None

Potential Workaround

None

Checklist

hide

TestRail: Results

Activity

Show:

steve.osguthorpe December 3, 2019 at 6:11 PM

Thanks. That seems completely reasonable to me.

Wayne Schneider December 3, 2019 at 2:03 PM
Edited

there are two keys in the module descriptor that can communicate that kind of information to the external operators:

launchDescriptor/dockerArgs/HostConfig/Memory: Total memory allocation for the container
launchDescriptor/env: You can set the JAVA_OPTIONS environment variable as you see fit. For example, you could use MinRAMPercentage instead of MaxRAMPercentage if you felt that was more appropriate.

One rule of thumb proposed by , which seems sensible to me, is that you set the size for a single tenant with a standard workload, whatever that means to you, with the expectation that the operator will scale containers horizontally to meet higher demand. Like all rules of thumb, it probably won't work for every circumstance, but it seems a reasonable starting point.

Beyond that, you can of course communicate specific resource needs in the module README.

Does that address your concerns?

steve.osguthorpe December 3, 2019 at 10:00 AM

- Thanks for the confirmation.
- Thank you too for the expansion and yes it is.
I do however have another question and it's to do with the none-heap settings, Metaspace (which used to be called permgen) is always set to a high number (the maximum), which basically removes any upper limits. SO even though you've specified max ram that's only for Heap. Are there any plans/recommendations for us to incorporate that setting? Should as a developer just, set my max ram percentage for value in the memory section of the descriptor? How do external ops teams know that at 800Mb we can only have 50% for heap, but at 3GB you can allocate up to 90%?

John Malconian December 2, 2019 at 4:32 PM

is most likely referring to this error that occurs in the container log when using the old fabric8-based container:

cat: can't open '/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes': No such file or directory

This specific error is generated when cgroup swap auditing is not enabled in the host kernel. This is default for recent Debian/Ubuntu kernels (it can be enabled via a kernel parameter). The old fabric8 base image had hooks for setting this control group, because it was originally written for Redhat/CentOS where cgroup swap auditing is enabled by default. At any rate, we are no longer using a base image based on fabric8 so this error should no longer appear. We do not set any container limitations on swap anyway - just RAM. Control group auditing for RAM is enforced.

David Crossley November 29, 2019 at 12:58 AM

also asked on https://folio-org.atlassian.net/browse/MODLIC-8#icft=MODLIC-8 "whether the CGroup is correctly available". Could you please explain how to verify that.

Done

Details

Assignee

Reporter

Priority

Story Points

Sprint

Development Team

Core: Platform

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs
Created October 31, 2019 at 7:22 AM
Updated June 3, 2020 at 4:40 PM
Resolved November 18, 2019 at 2:41 PM
TestRail: Cases
TestRail: Runs