[FOLIO-3112] edge-nlb create NLB target groups: refenv build sometimes fails Created: 12/Apr/21  Updated: 04/May/21  Resolved: 04/May/21

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: TBD
Reporter: David Crossley Assignee: Wayne Schneider
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Relates
relates to FOLIO-3080 expose edge-connexion's via a network... Closed
Sprint: DevOps Sprint 112, DevOps Sprint 113
Development Team: FOLIO DevOps

 Description   

There have been some occasions of reference environment daily builds wher the task "edge-nlb create NLB target groups" fails.

A manual re-run fixes it.

Some examples:
2021-04-11 879-folio-snapshot
2021-04-07 876-folio-snapshot-load
2021-04-05 873-folio-snapshot-load

failed: [10.36.1.85] (item={'protocol': 'tcp', 'port': 9000}) => {"ansible_loop_var": "item", "changed": false, "item": {"port": 9000, "protocol": "tcp"}, "module_stderr": "Traceback (most recent call last):\n  File \"<stdin>\", line 102, in <module>\n  File \"<stdin>\", line 94, in _ansiballz_main\n  File \"<stdin>\", line 40, in invoke_module\n  File \"/usr/lib/python3.8/runpy.py\", line 207, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib/python3.8/runpy.py\", line 97, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib/python3.8/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_elb_target_group_payload_d3w07goe/ansible_elb_target_group_payload.zip/ansible/modules/cloud/amazon/elb_target_group.py\", line 828, in <module>\n  File \"/tmp/ansible_elb_target_group_payload_d3w07goe/ansible_elb_target_group_payload.zip/ansible/modules/cloud/amazon/elb_target_group.py\", line 822, in main\n  File \"/tmp/ansible_elb_target_group_payload_d3w07goe/ansible_elb_target_group_payload.zip/ansible/modules/cloud/amazon/elb_target_group.py\", line 706, in create_or_update_target_group\nTypeError: 'NoneType' object is not subscriptable\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}


 Comments   
Comment by David Crossley [ 22/Apr/21 ]

I have not seen repeat of this behaviour during recent daily builds, since this ticket was opened.

 

Comment by David Crossley [ 25/Apr/21 ]

This behaviour happened again today 2021-04-24 with both folio-snapshot #897 and folio-snapshot-load #898

I did re-run the folio-snapshot build.

Comment by Wayne Schneider [ 28/Apr/21 ]

I've added retries into the task that seems to fail consistently, in case there is a race condition that is causing the task to fail. Leaving this in review for some time to see if we continue to have failures.

Comment by Wayne Schneider [ 29/Apr/21 ]

This is hopeful, it looks like the retry was successful last night:

TASK [edge-nlb : create NLB target groups] *************************************
FAILED - RETRYING: create NLB target groups (5 retries left).

(it succeeded on the second attempt)

Comment by Wayne Schneider [ 30/Apr/21 ]

Same behavior again overnight April 29-30, initial failure with successful retry.

Comment by David Crossley [ 03/May/21 ]

On Sunday, the weekly build of "folio-r1-2021-release #8" had this NLB problem.

Probably its folio-infrastructure branch was created prior to the recent fix.

I did a manual re-run of the Jenkins job, which was successful.

However i chickened out of fixing the branch.

Comment by Wayne Schneider [ 03/May/21 ]

Thanks, David! I cherry-picked the commit that fixes this, so hopefully we won't see this again.

Generated at Thu Feb 08 23:25:42 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.