[FOLIO-2185] SPIKE: how to maintain resource deployment descriptors in the module repository? Created: 22/Jul/19  Updated: 03/Jun/20  Resolved: 06/Sep/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P3
Reporter: Jakub Skoczen Assignee: Wayne Schneider
Resolution: Done Votes: 0
Labels: platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks FOLIO-2219 Create script to build Kubernetes dep... Closed
blocks FOLIO-2234 Add LaunchDescriptor settings to each... Closed
blocks FOLIO-2235 Add LaunchDescriptor settings to each... Closed
Relates
relates to FOLIO-2237 Modify CI script which publishes mini... Closed
relates to FOLIO-2334 Spike: Investigate using JVM features... Closed
relates to FOLIO-2236 Add new dev doc to explain LaunchDesc... Closed
relates to FOLIO-2241 Update group_vars files in folio-ansi... Closed
relates to FOLIO-1729 Use container memory limits to manage... Closed
relates to FOLIO-2183 integrate backend-module pipeline int... Closed
relates to FOLIO-2242 Update group_vars files in folio-ansi... Closed
relates to FOLIO-1698 determine additional module metadata ... Closed
relates to FOLIO-2240 SPIKE: figure out how to share templa... Closed
Sprint: CP: sprint 71
Story Points: 3
Development Team: Core: Platform

 Description   

this is a follow up to FOLIO-2183 Closed . We would like the module developers to author those files and provide appropriate settings for things like memory, CPU, etc.

We have decided that the way to address this problem is by keeping the settings in ModuleDescriptor and provide a stand-alone script that will generate K8s YAML descriptor from the MD. ( FOLIO-2219 Closed )



 Comments   
Comment by Ian Hardy [ 30/Aug/19 ]

Tasks:

1. Select test module (should require both environment vars and args ideally mod-login is a good candidate)
2. Fill in launch descriptor with necessary environment variables and arguments
3. test that LD can be used to launch a docker container by having okapi try to deploy this modified LD.
3.5. Make sure LD doesn't break existing folio-ansible roles
4. create kubernetes yaml ( FOLIO-2219 Closed )

Comment by Jakub Skoczen [ 03/Sep/19 ]

Wayne Schneider Ian Hardy Guys can we get an example of how all nessesary information would be contained in the launch descriptor before we start rolling this out? It can be a pseudo (imaginary) example?

Comment by Wayne Schneider [ 03/Sep/19 ]

Here is an example for mod-login:

  "launchDescriptor": {
    "dockerImage": "${artifactId}:${version}",
    "dockerArgs": {
      "HostConfig": {
        "Memory": 268435456,
        "PortBindings": { "8081/tcp":  [{ "HostPort": "%p" }] }
      }
    },
    "dockerCMD": [ "verify.user=true" ],
    "dockerPull": false,
    "env": [
      { "name": "JAVA_OPTIONS", "value": "-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap" },
      { "name": "DB_HOST", "value": "10.0.2.15" },
      { "name": "DB_PORT", "value": "5432" },
      { "name": "DB_USERNAME", "value": "folio_admin" },
      { "name": "DB_PASSWORD", "value": "folio_admin" },
      { "name": "DB_DATABASE", "value": "okapi_modules" },
      { "name": "DB_QUERYTIMEOUT", "value": "30000" },
      { "name": "DB_CHARSET", "value": "UTF8" },
      { "name": "DB_MAXPOOLSIZE", "value": "5" }
    ]
  }
Comment by Wayne Schneider [ 03/Sep/19 ]

The example above works well for Okapi deployment on Vagrant. Some issues:

  • The Memory setting is in bytes, for gods' sake.
  • I chose to put the memory in the HostConfig rather than adding as part of JAVA_OPTIONS – this is obviously more portable. For Java 8, this requires using the UseCGroupMemoryLimitForHeap experimental VM option. This behavior is default for Java 10+ (cf. https://bugs.openjdk.java.net/browse/JDK-8146115). This would incidentally close FOLIO-1729 Closed .
  • The settings in the env key are meant to show all the available variables, along with providing reasonable defaults. I'm not too happy with the DB_HOST default – that's the default host for Vagrant – but localhost doesn't make sense, as you would never run the DB in the container.

Ian Hardy, David Crossley – thoughts? Does this seem workable?

Comment by Ian Hardy [ 04/Sep/19 ]

One proposal for db host is to set a service in k8s called "postgres" that points to the pg db. In the Vagrant boxes, could create a "postgres" alias that resolves to localhost. That way we could set the default host to the same thing in both cases. Might be a little more convenient, but this doesn't really solve the problem of having to pick some value that doesn't work "out of the box" since it depends on some specific infrastructure configuration.

Comment by Wayne Schneider [ 04/Sep/19 ]

Here's what I found in my testing:

  • Ian's postgres filler value works just fine on a Vagrant box with an entry in /etc/hosts, so that's kind of nice
  • Okapi accepts either the empty string, null, or an EnvEntry with no value key for the DB_QUERYTIMEOUT environment entry. RMB, however, apparently does not – it throws an error trying to convert the value to a number:
    04 Sep 2019 15:33:29:123 ERROR PostgresClient [4370eqId] For input string: "null"
    java.lang.NumberFormatException: For input string: "null"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_181]
        at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_181]
        at java.lang.Integer.parseInt(Integer.java:615) ~[?:1.8.0_181]
        at org.folio.rest.tools.utils.Envs.lambda$0(Envs.java:48) ~[mod-login-fat.jar:?]
        at java.util.Map.forEach(Map.java:630) ~[?:1.8.0_181]
        at java.util.Collections$UnmodifiableMap.forEach(Collections.java:1505) ~[?:1.8.0_181]
        at org.folio.rest.tools.utils.Envs.allDBConfs(Envs.java:43) ~[mod-login-fat.jar:?]
        at org.folio.rest.persist.PostgresClient.init(PostgresClient.java:458) ~[mod-login-fat.jar:?]
        at org.folio.rest.persist.PostgresClient.<init>(PostgresClient.java:182) ~[mod-login-fat.jar:?]
        at org.folio.rest.persist.PostgresClient.getInstanceInternal(PostgresClient.java:334) ~[mod-login-fat.jar:?]
        at org.folio.rest.persist.PostgresClient.getInstance(PostgresClient.java:353) ~[mod-login-fat.jar:?]
        at org.folio.rest.impl.TenantAPI.tenantExists(TenantAPI.java:143) ~[mod-login-fat.jar:?]
        at org.folio.rest.impl.TenantAPI.lambda$3(TenantAPI.java:218) ~[mod-login-fat.jar:?]
        at io.vertx.core.impl.ContextImpl.lambda$wrapTask$2(ContextImpl.java:339) ~[mod-login-fat.jar:?]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [mod-login-fat.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) [mod-login-fat.jar:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [mod-login-fat.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) [mod-login-fat.jar:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [mod-login-fat.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
    

This is actually kind of a problem, as we still have potentially quite long-running queries, so setting a timeout might break things. What are opinions? Options are:

  1. Omit the DB_QUERYTIMEOUT environment variable from the LaunchDescriptor. The behavior will be that there is no query timeout, so long running queries will just spin until complete.
  2. Set the timeout to something high, like 300000ms (5 minutes). Some queries still might break.
  3. Set the timeout to something fairly reasonable, like 60000ms, and accept that some queries will almost certainly break.
Comment by Wayne Schneider [ 04/Sep/19 ]

After further conversion, we've settled on this as a model:

  "launchDescriptor": {
    "dockerImage": "${artifactId}:${version}",
    "dockerPull": false,
    "dockerCMD": [ "verify.user=true" ],
    "dockerArgs": {
      "HostConfig": {
        "Memory": 268435456,
        "PortBindings": { "8081/tcp": [ { "HostPort": "%p" } ] }
      }
    },
    "env": [
      { "name": "JAVA_OPTIONS", "value": "-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap" },
      { "name": "DB_HOST", "value": "postgres" },
      { "name": "DB_PORT", "value": "5432" },
      { "name": "DB_USERNAME", "value": "folio_admin" },
      { "name": "DB_PASSWORD", "value": "folio_admin" },
      { "name": "DB_DATABASE", "value": "okapi_modules" },
      { "name": "DB_QUERYTIMEOUT", "value": "60000" },
      { "name": "DB_CHARSET", "value": "UTF-8" },
      { "name": "DB_MAXPOOLSIZE", "value": "5" }
    ]
  }

I will do some further testing to see what changes would need to be made to prevent Vagrant box builds and the "single server install" document from breaking.

Comment by Wayne Schneider [ 04/Sep/19 ]

Since the environment variables set on Okapi's /_/env endpoint override environment variables in the LaunchDescriptor, the only change required in folio-ansible is to remove the JAVA_OPTIONS memory settings in the group_vars files as module descriptors are updated. No changes are necessary in the single-server install runbook in folio-infrastructure.

Comment by Wayne Schneider [ 06/Sep/19 ]

The Ansible roles affected by this change are okapi-register-modules and okapi-deployment, both of which modify the stock LaunchDescriptor for a module based on keys in the folio_modules variable. They operate by overriding the env key and/or the dockerCMD key.

Changing this behavior would be tricky, and I'm not sure it's desirable – the behavior of overriding is more straightforward and less prone to error than somehow trying to combine the env and dockerCMD lists. Therefore, I think the solution is to update the group_vars files in folio-ansible now to include the -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap options as well as the memory settings, then giving ourselves the housekeeping task of cleaning up the group_vars files after FOLIO-2234 Closed and FOLIO-2235 Closed are complete.

I've created FOLIO-2241 Closed for the needed update, and FOLIO-2242 Closed for the eventual cleanup. And with that, I think this spike is closed.

Comment by steve.osguthorpe [ 13/Sep/19 ]

Can I also suggest the changing of the base image for java 8 to inherit from https://hub.docker.com/r/fabric8/java-alpine-openjdk8-jdk. That image as a base allows default vm memory allocations to be calculated as a ratio of the container limit. Emulating the container memory capabilities of java10+

Comment by Wayne Schneider [ 13/Sep/19 ]

Thanks for the comment, steve.osguthorpe. We do have an open issue for that ( FOLIO-1544 Closed ), which has been stalled for awhile. Perhaps it is time to pick it back up.

We also have on our backlog a move to OpenJDK 11 ( FOLIO-1772 Closed ), which would also address the issue of Java honoring container memory limits.

Generated at Thu Feb 08 23:18:50 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.