[FOLIO-2552] Intermittent failure on s3 cleanup job Created: 02/Apr/20  Updated: 03/Jun/20  Resolved: 01/May/20

Status: Open
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Story Priority: TBD
Reporter: Ian Hardy Assignee: Ian Hardy
Resolution: Done Votes: 0
Labels: devops-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint: DevOps: sprint 85, DevOps: sprint 86, DevOps: sprint 87, DevOps: sprint 88
Development Team: FOLIO DevOps

 Description   

Ian Hardy 9:15 AM
Just had a look through some of the logs of builds that had to be restarted and they seem to be failing on the s3 task I added for mod-data-export (the task is to delete the bucket/keys before recreating it for a new build). Kind of odd, since when its re-run it works.

TASK [s3-data-export : delete bucket and keys] *********************************
FATAL: command execution failed


 Comments   
Comment by Ian Hardy [ 02/Apr/20 ]

Hi David Crossley I added some retries on this task that I think has been causing some of those failures at the 4 minute mark fo the tests builds. In testing, it didn't break the build, but I guess we won't know right away if that fixes it.

Comment by David Crossley [ 13/Apr/20 ]

There were no problems since the day that you did that. Today there was a similar:

TASK [s3-data-export : delete bucket and keys] *********************************
FAILED - RETRYING: delete bucket and keys (3 retries left).
FATAL: command execution failed

Doing re-run of the folio-snapshot build was successful.

Comment by Ian Hardy [ 13/Apr/20 ]

Thanks David Crossley. Interesting that it never actually did the retries its supposed to do.

Comment by Wayne Schneider [ 24/Apr/20 ]

Keeping open through the end of the sprint. Failure is too intermittent to pin down. If no further issues, we'll close it and open a new issue if it recurs.

Comment by Wayne Schneider [ 01/May/20 ]

No further recurrence since April 13, closing issue.

Comment by David Crossley [ 03/May/20 ]

Murphy strikes ...

Today 2020-05-03 https://jenkins-aws.indexdata.com/job/Automation/job/folio-q1-2020-release/17/console

TASK [s3-data-export : delete bucket and keys] *********************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: WaiterError: Waiter BucketNotExists failed: Max attempts exceeded
fatal: [localhost -> localhost]: FAILED! => {"boto3_version": "1.12.45", "botocore_version": "1.15.45", "changed": false, "msg": "An error occurred waiting for the bucket to be deleted.: Waiter BucketNotExists failed: Max attempts exceeded"}
	to retry, use: --limit @/home/jenkins/workspace/Automation/folio-q1-2020-release/folio-infrastructure/CI/ansible/release-q1-2020.retry
Comment by Ian Hardy [ 04/May/20 ]

Ha, figures. Thanks David Crossley. Maybe I should just rewrite this task to use drop to the shell and do an aws cli command and give up on the s3_bucket ansible module for this particular application.

Generated at Thu Feb 08 23:21:29 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.