|
Looks at reporting data, it appears Rancher became completely unavailable around 11/26-11/27. During the period of 11/26-12/1, the CPU usage was pinned to 100% and on the MySQL RDS instance that is used by the Rancher K3s cluster. All Rancher pods were stuck in a CrashLoopBackOff cycle that prevented Rancher was ever becoming fully available. During this same period, the K3 cluster itself was functional.
Log file analysis and troubleshooting revealed very few clues as to root cause. After hours of review, chasing dead ends, and additional analysis from jroot, no resolution could be determined. At that point, I provisioned a new Rancher K3s cluster and re-imported the FOLIO EKS cluster. Stanislav Miroshnichenko did the rest by re-creating all the team projects.
After additional log review the previous K3s cluster was terminated (2 x EC2 instances and 1 x RDS mySQL instance).
|