[FOLIO-2896] Rancher unavailable in hosted dev environment Created: 02/Dec/20  Updated: 07/Dec/20  Resolved: 07/Dec/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P2
Reporter: John Malconian Assignee: John Malconian
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint: DevOps: Sprint 103
Development Team: FOLIO DevOps

 Description   

Rancher reported as unavailable somewhere between11/26-11/27.



 Comments   
Comment by John Malconian [ 02/Dec/20 ]

Looks at reporting data, it appears Rancher became completely unavailable around 11/26-11/27. During the period of 11/26-12/1, the CPU usage was pinned to 100% and on the MySQL RDS instance that is used by the Rancher K3s cluster. All Rancher pods were stuck in a CrashLoopBackOff cycle that prevented Rancher was ever becoming fully available. During this same period, the K3 cluster itself was functional.

Log file analysis and troubleshooting revealed very few clues as to root cause. After hours of review, chasing dead ends, and additional analysis from jroot, no resolution could be determined. At that point, I provisioned a new Rancher K3s cluster and re-imported the FOLIO EKS cluster. Stanislav Miroshnichenko did the rest by re-creating all the team projects.

After additional log review the previous K3s cluster was terminated (2 x EC2 instances and 1 x RDS mySQL instance).

Comment by John Malconian [ 02/Dec/20 ]
Generated at Thu Feb 08 23:24:03 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.