[FOLIO-3091] Vagrant builds time out in tenant init Created: 24/Mar/21  Updated: 24/Apr/21  Resolved: 24/Apr/21

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: TBD
Reporter: Wayne Schneider Assignee: John Malconian
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
is blocked by FOLIO-3103 Restore mod-search/ui-inventory-es on... Closed
is blocked by MSEARCH-86 daily refenv build failed 20210402 mo... Closed
Duplicate
is duplicated by FOLIO-3123 Deploy/enable modules failing for Vag... Closed
Relates
relates to MSEARCH-83 mod-search breaks if authtoken module... Closed
Sprint: DevOps Sprint 110, DevOps Sprint 111, DevOps Sprint 112
Development Team: FOLIO DevOps

 Description   

Error message:

snapshot: fatal: [default]: FAILED! => {"changed": false, "content": "", "elapsed": 900, "msg": "Status code was -1 and not [200]: Connection failure: timed out", "redirected": false, "status": -1, "url": "http://10.0.2.15:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadReference%3Dtrue%2CloadSample%3Dtrue"}

This does not happen for the other builds.
This happens for all builds based on platform-complete



 Comments   
Comment by Wayne Schneider [ 24/Mar/21 ]

In a test build, Okapi was OOM-killed.

Comment by Wayne Schneider [ 24/Mar/21 ]

Increasing the VM size to 20GB allows the build to complete. Testing now with packer/Jenkins.

Comment by Wayne Schneider [ 24/Mar/21 ]

Even with 20GB the tenant init times out. More testing required.

Comment by Wayne Schneider [ 29/Mar/21 ]

This issue has now cropped up with the testing-backend build:

testing-backend: fatal: [default]: FAILED! => {"changed": false, "content": "", "elapsed": 900, "msg": "Status code was -1 and not [200]: Connection failure: timed out", "redirected": false, "status": -1, "url": "http://10.0.2.15:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadSample%3Dtrue%2CloadReference%3Dtrue"}
Comment by Wayne Schneider [ 30/Mar/21 ]

In testing, it appears that mod-search is flipping out and never returning from the tenant init call. Why this is not happening with the AWS builds is a bit mysterious.

Comment by Wayne Schneider [ 30/Mar/21 ]

Investigation suggests that mod-search cannot return from the tenant init call until mod-authtoken is enabled for the tenant and the mod-search system user can log in and return a token. A couple of possible considerations:

  • A system with users, login, and permissions can run without mod-authtoken (it is not a required interface). Permissions are simply not enforced under those circumstances. It seems like mod-search should be able to manage that situation.
  • If mod-search truly requires a token for some reason of its own, then perhaps it needs to require the authtoken interface
Comment by Wayne Schneider [ 31/Mar/21 ]

The issue only comes up if there are messages in Kafka (as, for example, when inventory data are created with the loadSample=true tenant parameter). The module seems to go into a tight loop trying to get a token from Okapi, and only breaks out of it when mod-authtoken is finally initialized. It seems like the loop consumes all available cycles and the module is not able to return from tenant initialization.

Comment by Wayne Schneider [ 01/Apr/21 ]

Raised MSEARCH-83 Closed

Comment by Wayne Schneider [ 02/Apr/21 ]

This is not the whole explanation however, as mod-search is not part of the testing-backend and testing Vagrant builds.

Comment by Wayne Schneider [ 02/Apr/21 ]

In the testing-backend build, the tenant init eventually succeeds, but it takes more than 15 minutes, so the Ansible play times out!

Comment by Wayne Schneider [ 02/Apr/21 ]

Updated timeout setting for tenant init. There are still issues with mod-search, however, so holding this open until those can be resolved.

Comment by Wayne Schneider [ 06/Apr/21 ]

Until MSEARCH-86 Closed is resolved, mod-search is not in the reference builds, so no way to make further progress on this issue.

Comment by Wayne Schneider [ 20/Apr/21 ]

John Malconian I think this is the issue you are now working on, so reassigning to you.

Comment by Wayne Schneider [ 24/Apr/21 ]

This appears to be resolved for the folio/snapshot Vagrant build. The folio/testing and folio/testing-backend builds have a different issue, now ( FOLIO-3133 Closed ), I think unrelated. Closing this issue.

Thanks John Malconian!

Generated at Thu Feb 08 23:25:32 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.