[FOLIO-2250] RMB modules crash on tenant init with updated LaunchDescriptor Created: 10/Sep/19  Updated: 03/Jun/20  Resolved: 11/Sep/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P2
Reporter: Wayne Schneider Assignee: Wayne Schneider
Resolution: Done Votes: 0
Labels: devops, platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks FOLIO-2234 Add LaunchDescriptor settings to each... Closed
Relates
relates to FOLIO-2315 Re-assess the memory allocation in de... Blocked
Sprint: CP: sprint 72
Story Points: 2
Development Team: Core: Platform

 Description   

Reported by David Crossley:
Today i commenced FOLIO-2234 Closed to add the new LaunchDescriptors.
Started with mod-notes.
Then did an additional test of this first one via Jenkins folio-testing-test
https://jenkins-aws.indexdata.com/job/Automation/job/folio-testing-test/21/console
however it fails with:

"POST request for mod-notes-2.7.0-SNAPSHOT.134 /_/tenant failed with Connection refused: /10.36.1.114:9152"

I can do a successful test locally via the VM "testing-backend" to upgrade from SNAPSHOT-133 to SNAPSHOT-134. But of course that is via vagrant, not via the reference environment.



 Comments   
Comment by Wayne Schneider [ 10/Sep/19 ]

Container crashed on tenant init. Container log:

10 Sep 2019 04:11:36:122 INFO  TenantAPI [115771eqId] sending... postTenant for diku
10 Sep 2019 04:11:36:124 INFO  PostgresClient [115773eqId] DB config read from environment variables
10 Sep 2019 04:11:36:135 INFO  PostgresClient [115784eqId] postgreSQLClientConfig = {"maxPoolSize":5,"port":5432,"host":"10.36.1.114","username":"folio_admin","database":"okapi_modules","password":"..."}
10 Sep 2019 04:11:36:249 INFO  BaseSQLClient [115898eqId] Creating configuration for 10.36.1.114:5432

Not very illuminating.

Comment by Wayne Schneider [ 10/Sep/19 ]

The problem appears to be an OOM error on tenant init. I was not able to reproduce it on a Vagrant VM, but I can reproduce it reliably on the reference environment in AWS.

I will do some further testing. It may be a problem that the container memory limit is set to the same as the max heap size – perhaps we need to give it more headroom?

Comment by Wayne Schneider [ 10/Sep/19 ]

If I remove the -Xmx Java option from the command line, the container launches and stays up through tenant init. Monitoring memory usage, it seems just fine with the 256M set as the container limit.

I believe we need to set the Memory key in the LaunchDescriptor to something like 1.33x the -Xmx setting, so that the max heap is set to roughly 75% of available memory. Unfortunately, this is more of a rule of thumb than anything. This would mean:

-Xmx256m = 357913941
-Xmx384m = 536870912
-Xmx512m = 715827883

What do you think, David Crossley?

Comment by David Crossley [ 10/Sep/19 ]

Good discovery. Okay, i will document that, and test with various modules today.

Comment by David Crossley [ 10/Sep/19 ]

Wayne Schneider Is this a temporary fix, and these 1.33x memory settings can be reduced after the FOLIO-2242 Closed cleanup (which removes the -Xmx settings from the folio-ansible group_vars)?

Comment by Wayne Schneider [ 11/Sep/19 ]

Yes, this could be temporary, I think. I hate to have all that cleanup to do afterwards, though.

Comment by David Crossley [ 11/Sep/19 ]

Probably going to revisit the MDs again anyway, after FOLIO-2237 Closed adjusts the script which generates the readme snippet for Docker Hub.

Revising the LD Memory settings will help to bring the overall memory requirement (e.g. for folio-install) back to be less big.

Comment by David Crossley [ 11/Sep/19 ]

The new LaunchDescriptors are now in place for mod-notes, mod-users, mod-login, and mod-circulation.
Tested again via https://jenkins-aws.indexdata.com/job/Automation/job/folio-testing-test/23/console
and a successful build.

So it seems that this ticket can be closed. Thanks for your great work.

Generated at Thu Feb 08 23:19:18 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.