Off-hour Environment Downscale

1. Introduction

  • Purpose: 

The purpose of this document is to establish guidelines and procedures for the scheduled shutdown of Rancher environments and shared resources. It provides instructions and requirements for team members to follow during the shutdown period, as well as outlines the self-service tools that will be made available to perform environment resume operations.

  • Scope: 

This document applies to all teams supported by Kitfox that utilize Rancher environments and shared resources (RDS, Kafka, Opensearch). It outlines the policy, objectives, and procedures for the scheduled shutdown, as well as the requirements for advance notice and the permissions granted to teams for self-service tasks. The document also covers emergencies, compliance, and consequences related to the shutdown policy.

Current Rancher env: https://folio-org.atlassian.net/wiki/pages/viewpage.action?pageId=1396467

  • Document Revision History:

5/5/23: initial version

2. Environment Shutdown Policy

  • Policy Statement:

The Kitfox team implements a scheduled shutdown of Rancher environments on weekends to achieve cost savings and optimize resource allocation. This policy ensures that the company's cloud resources are utilized efficiently and aligns with our commitment to cost-effective operations.

  • Objectives:

    • Cost Savings: By shutting down Rancher environments and shared resources during weekends and off hours, we aim to reduce cloud resource consumption and associated costs, contributing to overall cost savings for the organization.
    • Resource Optimization: The scheduled shutdown helps optimize resource allocation by temporarily pausing non-critical workloads and freeing up cloud resources for other purposes, ensuring efficient utilization of available resources.

3. Shutdown Schedule (WIP)

  1. Effective Date: April 21, 2023
  2. Initial Shutdown Period:
  • Start Time: 6:00 PM ET, Friday
  • End Time: 6:30 PM ET, Sunday

4. Self-Service Tools for Environment Shutdown

  1. Self-Service Tools (https://jenkins-aws.indexdata.com/job/Rancher/job/Automation/job/start-stop-project/)
  2. Job is documented at How to start and activate Rancher environment and informs KitFox channel when environment is started
  3. Self-service job is available to Tech Lead, Scrum Master, PO, QA Leads
    1. action: Choose the action to be performed with the project "start" - if you need to work with env during stop hours, "stop" - after you finished your work.
    2. clusterName: Select the target cluster where the Kubernetes namespace is located.
    3. projectName: Select the target project where the Kubernetes namespace is located.
  4. Access and Permissions
    1. Tech Lead, Scrum master, PO, QA Lead
  • Rancher Environment stop/start: Team members with the designated roles will have the permission to start and stop Rancher environments using the self-service tools. This allows them to initiate the shutdown and subsequent resumption of the environments.
  • Shared Cloud Resources Pause/Resume: Team members with the designated roles will have the privilege to manage shared cloud resources, such as RDS, Kafka, OpenSearch, and other relevant services. This includes provisioning, configuration, and monitoring of these resources within the Rancher environment.
  • TBD: Revocation of permissions/privileges 
  1. Training and Documentation
    1. Training Plan: Do we need it? The idea is to to ensure team members with the designated roles are familiar with the self-service tools and the management of shared cloud resources. If yes, what should it include?
    2. Documentation: wiki page with step-by-step instructions on how to utilize the self-service tools for environment shutdown/resume operations

5. Compliance and Consequences

  1. Compliance with Shutdown Policy: All team members are expected to comply with the environment shutdown policy and adhere to the specified schedule and procedures.
  2. Consequences of Non-Compliance: Non-compliance with the shutdown policy may result in forcing developers to code in Cobol for the rest of their lives.




Maccabee Levine
July 5, 2023

@Mark Veksler @Yogesh Kumar 

Yogesh Kumar
July 5, 2023

The purpose of the self-serve tool is to enable teams to continue their longer-term testing, such as harvest, without having to shut down their environments.

Marc Johnson
July 6, 2023

Does that mean that not all environments will use the standard shutdown schedule?

Maccabee Levine
July 6, 2023

@Marc Johnson Yes, I believe the intent is to let them have timezone flexibility.  The environment request process does ask if the dev team will need shorter downtime than the standard, which would have budget impact.  Otherwise if the hours are just shifting, no budget impact.

Marc Johnson
July 6, 2023

Ok.

My interpretation of the document as it stands, is that there was intended to be a widely adopted default of the scheduled shutdown

The impression I'm getting now is that it's an opt in by the development team

I think it could be useful to make it explicit which of these approaches is being advocated for

Maccabee Levine
July 6, 2023

@Marc Johnson I agree that is confusing and should be clarified.  I remember thinking that a change of hours of the same duration has the same budget impact, so I stopped caring.  @Yogesh Kumar @Mark Veksler I think it is up to Kitfox how the team can request non-standard off-hours, assuming that it will be the same every week and therefore a self-service stop/start every time is undesirable.  Or else to say we don't support that, but I think it would annoy some dev teams. 

If you do want to support that, in terms of documentation it could be as simple as adding a request for the specific non-standard downtime hours to the environment request doc template.

Maccabee Levine
July 14, 2023

@Yogesh Kumar @Steffen Köhler and I edited the new environment request process to indicate that if the dev team wants a different regular weekend shutdown schedule of the same length, they can do so via a Rancher ticket.