[FOLIO-3514] Jenkins jobs fail 2022-06-08 platform and refenv builds Created: 08/Jun/22 Updated: 13/Jun/22 Resolved: 13/Jun/22 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Bug | Priority: | TBD |
| Reporter: | David Crossley | Assignee: | John Malconian |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Sprint: | DevOps Sprint 142, DevOps Sprint 141 |
| Development Team: | FOLIO DevOps |
| RCA Group: | TBD |
| Description |
|
The hourly plaform builds started failed with the 01:19 UTC platform-complete job (and each one since) during the "Build FOLIO instance" stage. The daily folio-snapshot also failed similarly. The folio-snapshot-2 rebuild job has been disabled to preserve the existing. They seem to emit different messages and fail at different Ansible tasks (centred around setting up Kafka infrastructure). |
| Comments |
| Comment by John Malconian [ 08/Jun/22 ] |
|
These jobs appear to be failing because of an issue with SSH to the target host. Possibly something in the environment has changed. |
| Comment by John Malconian [ 08/Jun/22 ] |
|
On the other hand, the okapi-docker-container ansible role which runs right before the kafka-zk role is able to copy a j2 template to the target system with no issues. Very weird. |
| Comment by John Malconian [ 08/Jun/22 ] |
|
We are running into this Ubuntu kernel issue documented here. https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1977919 |
| Comment by John Malconian [ 08/Jun/22 ] |
|
Reverting to the previous AWS Ubuntu Focal AMI fixes the problem. I did this by hardcoding the AMI in the launch_ec2 playbook in folio-infrastructure: roles:
- role: launch_ec2_instance
ec2_security_groupids: ["{{ okapi_ec2_security_groupid }}", "{{ postgres_ec2_security_groupid }}", "{{ stripes_ec2_security_groupid }}"]
ec2_group: "{{ ec2_group }}"
ec2_hostgroups: "tag_Env_ci,tag_Group_{{ ec2_group }},tag_Okapi_true,tag_Postgres_true"
ec2_instance_type: "{{ instance_type }}"
# Hardcode AMI temporarily. See https://folio-org.atlassian.net/browse/FOLIO-3514
#ec2_ami: "{{ latest_ami.image_id }}"
ec2_ami: ami-01f18be4e32df20e2
ec2_instance_tags: "{{ instance_tags }}"
This should only be temporary. I suspect the issue will be fixed within a day or two. |
| Comment by John Malconian [ 13/Jun/22 ] |
|
Confirmed Ubuntu has released fixed kernel and AWS AMI. Reversed changed so that we once again get the latest Ubuntu Focal AMI. |