[FOLIO-2732] bogus 1-minute-long "build folio platform" builds Created: 18/Aug/20  Updated: 25/Aug/20  Resolved: 25/Aug/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Zak Burke Assignee: John Malconian
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: Text File snapshot-fail-log.txt     Text File snapshot-success-log.txt     Text File testing-fail-log.txt     Text File testing-success-log.txt    
Sprint: DevOps: Sprint 95, DevOps: Sprint 96
Development Team: FOLIO DevOps

 Description   

On multiple occasions over the last several weeks, the nightly builds for for folio-snapshot and folio-testing have cruised through "Build FOLIO platform" step in only 1 minute when it normally takes ~25 and reported success even though the resulting build was in fact unusable.

Logs:



 Comments   
Comment by David Crossley [ 19/Aug/20 ]

One clue is that this seems to not be a problem for the "folio-testing-core-backend" and "folio-snapshot-core" builds.

Comment by John Malconian [ 25/Aug/20 ]

The issue impacts all reference builds that are launched on ec2 instances. In the pipeline, we looked for an ec2 instance with a certain tag. If that tag is found we terminate the existing instance before launching a new instance via the Ansible ec2 module. The Ansible ec2 module launches a new instance only if an existing instance with the same tag is not running. So sometimes a new instance is not launched because it detects an existing instance that is still in the process of shutting down. No failure is returned by Ansible (by design) and the playbook continues.

I've made the following modifications to counteract this condition:

1. Increase sleep time from 60 seconds to 120 seconds after terminating existing instance to ensure previous instance has been completely terminated.
2. Print some additional debug info to the Ansible log when launching an instance.
3. Add an explicit fail condition in Ansible if a new instance is not launched.

Closing for now. If this continues to be a problem, we can reopen.

Generated at Thu Feb 08 23:22:50 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.