2023-05-12 Meeting Notes

Attendees:

Off-Hours Guidelines

D4. Define guidelines/best practices around pausing/stopping environments when they're not in use - e.g. off-hours/weekends/etc.

Since last week:

Today:

  • Good?
    • Yogesh Kumar still working on a few details with Kitfox
    • Required info change ok.
  • Should release environments also have off-hours?  (From Kitfox sprint review: Peter Murray discussed bringing to PC and TC whether we bring the same weekend suspension to the hosted release environments.  Peter will look at what savings that would be.)
    • Do we have a way to determine how much those envs are used overnight and on weekends?  They are more publicly known.  
    • Saving dollars, why not?  Multiple time zones, harder
    • Maybe start with weekends only?   Peter Murray will estimate cost savings.  No actual env changes for that right now.
      • Look at logs also, see who accesses them over the weekend.
      • Maybe higher use nearer a flower release.
      • Look at snapshot, snapshot-2, and the two release environments
    • Get input from PC, SIGs, CC.  At timing of the rest of the input.
    • Don't shut down on weekends any system used for demos on the public website/wiki.  Consider changing that website language?  Flag for CC question (www.folio.org).

Budgets / Cost Anomaly Detection

D5. Create AWS Budgets and AWS Budget Alerts for daily and monthly spend rates

D6. Explore AWS Cost Anomaly Detection and Rightsizing Recommendations

Last time:

  • How would teams estimate costs?  AWS Calculator lets you put together a set of AWS services, create a link to share estimate.  But you have to know what you are asking for.  X size databases, open search, etc.
    • All teams under folio-dev: Projects(Namespaces).  Hard for teams to do the calculation / effort.
    • Consensus: calculator may be too complicated, better for creating an entirely new cluster.  Never mind.
    • Standard enviorment is composed of X copmonents.  Can estimate that cost.  If they need to load 100 million records into the DB, that will cost more.  
    • Yogesh Kumar ask Kitfox to determine standard environment cost, i.e. Bama.  Maybe a price list, where the standard environment uses X resources, has X monthly cost.  Team may need to increase size of DB, so higher proportion of cost. 
    • Or simplify.  Maybe just two "recipes", and what the monthly cost was for those recipes.  "Standard" and "Premium".  Or "Dev" vs "Testing".  Start from there.
  • ACRG periodically review actuals vs. budget?  Annual and each flower release?
    • For annual budget, need to look at roadmap, features planning to implement, how many dev teams.  Based on prior year's actuals, estimate next year's budget.  Given inflation and AWS pricing changes, maybe a 10% increase if all else stable. 
      • Repeat the eval after each flower release.
      • Let CC know updated estimates for total costs.
    • Could break into flower release, and budget per team.  But maybe not needed per team at that stage.
  • Anomaly detection and cost budget alerts?
    • Discuss next week.

Today:

  • Review new section on budget page, "Review by AWS Cost Review Group".  Accurate?
    • Edited, now good.
  • Review draft environment "price list" / "recipes" from Yogesh Kumar if ready.
    • Look at next week.
  • What work has to be done to the budgets, budget alerts, anomaly detection, rightsizing reocomendations?
    • Two alerts are set up.  RDS and OpenSearch.  Come in as emails and slack notifications.
    • Also an alert set up for a budget that is everything other than RDS, OpenSearch and Compute.
    • Anomaly detection has been set up.  Group agrees it works for now, improve with each iteration.
  • Regular review?
    • ACRG should annually review the Budgets, Budget Alerts, Cost Anomaly Detection.  Look at Rightsizing Recommendations after each flower release.
  • No need for team-specific alerts.  Regular alerts would point us to the team responsible.

Reviewing Environments to Shut Down

D2. Define a process for reviewing existing tools and environments for candidates to be shut down (e.g. when a team leaves the project or the env is no longer needed)

Last time / Prior:

  • Mark Veksler draft guidelines on who should have permissions to what operations in AWS.  What will each team be allowed to do.  Link from ACRG doc.

Today:

  • Permissions on operations in AWS?  Right now AWS is just Kitfox, but Jenkins jobs are available to dev teams.
  • Kitfox prefers a self-service model.
  • Yogesh Kumar will update the environment lifecycle document to indicate what dev teams can do.

AWS Environments that are not for dev teams.  Do we apply similar processes?  (If so how?)

  • Do we need any other procedures for those four environments?  (snapshot, -2, and the two releases)
  • Consensus: no.

Environment requests for the existing team environments when the new procedures are approved

  • After we go live with the process, ask teams to submit the environment request for their existing environments.  Starting with X release (defined during community review process).

Off-hours shutdown during weekday evenings

  • Kitfox is looking at this.  What would work best for each team?  Ticket pending.
  • We can at least look at the findings, and decide to do something or not.  Harder b/c geographical spread.