Platform, DevOps and Release Management (UXPROD-1814)

[UXPROD-1827] CI-integrated continuous deployment (Q3, FOLIO setup) Created: 03/Oct/18  Updated: 16/Sep/20  Resolved: 11/Oct/19

Status: Closed
Project: UX Product
Components: None
Affects versions: None
Fix versions: Q3 2019
Parent: Platform, DevOps and Release Management

Type: New Feature Priority: P2
Reporter: Jakub Skoczen Assignee: Jakub Skoczen
Resolution: Done Votes: 0
Labels: NFR, cap-mvp, ci, platform-backlog, po-mvp, q3-2019, q3.1-2019
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
blocks UXPROD-1817 Preview capability for PRs (Q3, initi... Closed
blocks UXPROD-2066 CI architecture and process to allow ... Closed
blocks UXPROD-2118 Preview capability for PRs (Q4, final... Closed
blocks FOLIO-2011 initial roll out of PR preview capabi... Closed
is blocked by FOLIO-2182 document the AWS/EKS FOLIO deployment... Open
is blocked by FOLIO-1519 Automatic loading of sample and refer... Closed
is blocked by FOLIO-2053 add AWS K8s deployment configuration ... Closed
is blocked by FOLIO-2055 create an Ansible role for backend-mo... Closed
is blocked by FOLIO-2056 create a pipeline for a front-end mod... Closed
is blocked by FOLIO-2086 Set up HA FOLIO Rancher Closed
is blocked by FOLIO-2089 SPIKE: determine FOLIO log aggregatio... Closed
is blocked by FOLIO-2090 Create and document AWS VPC template ... Closed
is blocked by FOLIO-2129 SPIKE: Ingress approach for FOLIO and... Closed
is blocked by FOLIO-2130 SPIKE: how to clean up unused contain... Closed
is blocked by FOLIO-1408 deploy FOLIO on Kubernetes and docume... Closed
is blocked by FOLIO-1596 enable platform builds for all ui-* m... Closed
is blocked by FOLIO-2054 Stand up cluster for CI using AWS K8s... Closed
is blocked by FOLIO-2069 Deploy Okapi to K8s cluster Closed
is blocked by FOLIO-2117 Kubernetes integration (Q2, container... Closed
is blocked by FOLIO-2131 SPIKE: how to deploy containers based... Closed
Cloners
is cloned by FOLIO-2118 CI-integrated continuous deployment (... Closed
Relates
relates to OKAPI-728 Add filtering capabilities to /_/prox... Open
relates to FOLIO-1519 Automatic loading of sample and refer... Closed
relates to FOLIO-1548 SPIKE: a lighter-weight folio/testing... Closed
relates to FOLIO-1597 Add module dependency resolution qual... Closed
relates to FOLIO-2036 Include custom backend modules in PR ... Closed
relates to FOLIO-2057 SPIKE: explore AWS K8s cost models Closed
relates to OKAPI-670 Report dependency info for a module Open
relates to UXPROD-2066 CI architecture and process to allow ... Closed
relates to FOLIO-1576 folio-ansible: allow for URL deployment Closed
relates to FOLIO-2200 Implement K8 auto-scaling on EKS cluster Closed
Epic Link: Platform, DevOps and Release Management
Back End Estimate: XXL < 30 days
Back End Estimator: Jakub Skoczen
Development Team: Core: Platform
PO Rank: 10
Rank: Lehigh (MVP Summer 2020): R2

 Description   

Problem statement

Current FOLIO CI infrastructure and reference environments have several limitations that make it hard to scale them the point where a fully continuos and isolated deployment of development builds (e.g from PRs and feature branches, for both UI and backend modules) is possible. Those limitations include:

  • the reference envs (e.g snapshot) are recreated in full nightly, in a top-down approach (starting from a platform and resolving its dependencies which are deployed one by one).
  • the reference envs are primarily single-tenant (the so-called 'diku' demo tenant)
  • the reference envs are constructed from artefacts based on shared branches (master) only and all those artefacts are shared at runtime by the single demo tenant

This is makes it impossible to implement certain development process capabilities (like the PR previews, see FOLIO-1993 Closed ) without a list of fairly severe limitations (see description of the PR preview PoC for a comprehensive list of those limitations at https://dev.folio.org/guides/branch-preview/)

Proposed approach

We would like to revamp the deployment and orchestration infrastructure, create a new, clustered and multi-tenant, reference environment for development and integration purposes, and update the CI processes (Jenkins, Ansible, etc) to allow for a more continuos and isolated deployment of development artefacts.

Orchestration

Based on the prior work from from Jason Root (TAMU) and Mark Stacy (Core: platform/Colorado, see FOLIO-1408 Closed ) who have experimented with various orchestration tools and we have concluded that Kubernetes (K8s) has become the defacto standard for container orchestration and it is the orchestration tool of choice across major cloud vendors. It seems very likely that many organisations will use K8s for production-ready FOLIO deployments. K8s also brings many benefits for development deployments: a rich ecosystem of tools that ease provisioning of dependencies (e.g Helm) and widely accepted practices and processes for deployment of multiple development builds.

The Core: platform has undertaken a focused effort to ease K8s integration across FOLIO platform. This effort is tracked on UXPROD-1823 Draft .

Clustered reference environment

This work is being done outside of the Platform team, see FOLIO-2053 Closed

CI process

We would like to extend the CI process in a way which allows to:

  • deploy a backend container immediately after a successful automatic build (Jenkins/CI) on the clustered reference environment. This would include both snapshot and release builds and could be scaled up to allow for a deployment of feature branch builds (at specific points in the PR lifecycle). The deployed container should be registered with Okapi running on the clustered reference environment in order to allow for it being used as a dependency in appropriate tenant configurations.
  • create an independent tenant, on the clustered reference environment, for each platform- and ui- module build (of all kinds, including release, snapshot, and feature branch) to allow for running the particular build in isolation (including sample data set isolation). In the case of ui- modules, which are not self-contained units, the build process must be able to embed the module in an appropriate platform to form a complete Stripes bundle. Such bundle should be exposed to users after a succesful build. Okapi remains responsible for providing backend service dependencies in the created tenant.

See FOLIO-2054 Closed , FOLIO-2055 Closed and FOLIO-2056 Closed

Further read

PR previews: UXPROD-1817 Closed
Kubernetes integration: UXPROD-1823 Draft
FOLIO RM and CI concepts: https://docs.google.com/document/d/1au2hG4gPekyZ_HxAU7s6sc4NOvRaVovUcBnMLa1iR7E/edit?usp=sharing



 Comments   
Comment by Wayne Schneider [ 03/Oct/18 ]

One way this might work:

  • set up a PostgreSQL instance to back the FOLIO system
  • set up an nginx server to serve bundles and proxy Okapi for each tenant
  • create an Okapi cluster. A new release of Okapi triggers a rolling upgrade of the cluster.
  • create one or more "reference tenants" on the Okapi cluster. These tenants might represent the "snapshot" FOLIO build using prerelease artifacts, "snapshot-stable" using prerelease artifacts that pass regression tests, and "release" using only released artifacts (to use our current structure – could refactor this).
  • each new release of a backend module (including snapshots) spins up a Docker container accessible at a URL based on module id (e.g. mod-authtoken-1_5_2-SNAPSHOT_52.aws.indexdata.com:8081). The db connection string is included as environment variables for the container (or as a config file in the container). A module descriptor and deployment descriptor are posted to the Okapi cluster.
  • A snapshot release of a backend module triggers an attempt to upgrade the module for the "snapshot" tenant. For storage modules, upgrade scripts must be in place if the data structure or database schema changes (e.g. due to data structure changes or indexes).
  • Each new release of a frontend module (including snapshots) causes a module descriptor to be posted to the Okapi cluster
  • Each new release of a frontend module triggers an attempt to upgrade the module for the "snapshot" tenant
  • The module is upgraded in Okapi to pick up any new permissions
  • A new webpack is built for the tenant that includes the updated module

This is just a first pass at how we might do it. Other thoughts?

Comment by Anton Emelianov (Inactive) [ 16/Oct/18 ]
  1. We need a folio system where dev teams and product owners can collaborate before changes are ready to be committed to a master branch
  2. This system should be based on the “next-release” set of modules. It will be dev team responsibility to push the latest versions of modules that are needed for collaboration, testing and PO acceptance.
  3. A multi-tenant Folio system will be accessed by multiple development teams.
  4. Each team should be able to manage/deploy a set of modules without affecting other teams.
  5. Dev team should be able to have access to more than one tenant to allow multiple teams to collaborate on the same project
  6. Dev team should NOT have access to AWS infrastructure. No AIM accounts will be provisioned for devs.
  7. A dev team should be able to deploy module from a branch (not master) to a tenant.
  8. Both UI and backend module deployment should be automated.
  9. Location of a system should be considered as we can have devs and PO spread over 9 time zones.
  10. The process of requesting/provisioning of a tenant should be automated
  11. All operations should be done from CLI interface
  12. Automated deployment of a module process should be driven by tenant credentials.
  13. Automated deployment of a module should be fast and allow multiple iterations per working day.
  14. Developers should be able to request a tenant provisioning via CLI
  15. Tenants should be automatically removed in no modules were deployed for ?? days
  16. Developers cannot expect prolonged continuity of a tenant ( no more than ??) because the system will be rebuild frequently.
  17. Complete teardown/rebuild will happen if a new “next-release” version of Folio is available
  18. It should be easy for developers to identify that the system has been refreshed and they need to redeploy their modules.
  19. Initially, the system will not handle schema migrations
Comment by Ann-Marie Breaux (Inactive) [ 18/Oct/18 ]

One thing that Anya and I discussed a couple days ago.

For PO/Tester testing of newly-fixed bugs and new functionality (story reviews), the closer to the developer, the better. Having a place that we can test where the developer knows the new code has been released, and that we (think) will work in the context of the rest of FOLIO is good. Per conversations with Anton, we should ideally catch these types of problems at folio-testing at the latest, preferable sooner than that.

In terms of exploratory testing and usability, it seems like that should happen on as stable and refined version of FOLIO as possible - basically the public face of FOLIO that we want the community to see, and that we expect to have the main integration problems stabilized and main bugs already identified and removed. It seems like exploratory testing should happen at a point like snapshot-stable, later in the process than the story review testing.

Comment by Anton Emelianov (Inactive) [ 18/Oct/18 ]

Ann-Marie Breaux, You're exactly right. These two use cases are requiring 2 different environments and we'/re missing the first one.

Comment by John Malconian [ 18/Oct/18 ]

If there is a need to preview functionality and test code before committing to master, the Pull Request stage would seem like a logical choice to build and deploy preview artifacts. The process would be something like:

1. submit PR
2. build and deploy PR artifacts.
3. provision unique tenant id for PR.
4. enable PR artifacts alongside/in lieu of next-release artifacts for tenant
5a. if PO and dev are satisfied with preview and PR passes whatever automated tests are configured to run for that repo, then the PR is merged to master and the tenant is automatically deleted. A proper release of that artifact can then be cut and added to next-release.
5b. If preview mode of artifact is not satisfactory, the PR can either be closed OR the dev can commit updates to the branch that opened the PR and a new tenant is enabled with revised artifacts and reviewed.

Comment by Jakub Skoczen [ 22/Oct/18 ]

John Malconian Precisely, I think the approach of "PR previews" would give us the most flexibility in terms of isolated and unobstructed PO-Developer communication. In general terms the flow of work could be something like:

1. Developer issues PR
2. Automatic checks are run (lint, SQ, etc)
3. PR preview is deployed
4. PO reviews and accepts feature
5. PR is merged to master
6. Feature is deployed on the next nightly build (folio-snapshot and after passing aut int test folio-snapshot-stable) along with all other accepted features
7. Feature is scheduled for the next release and when released it becomes available on the releases environment (folio-release)

John Malconian How complex is it to build "PR preview" functionality?

Comment by Jakub Skoczen [ 22/Oct/18 ]

Adam Dickmeiss John Malconian Wayne Schneider Anton Emelianov Matthew Jones Zak Burke thanks for your time and availability today.

We had a long meeting about this and related issues today. There's been a lot of discussion about how a new type of "clustered" continuous deployment environment could help us address some of the issues we are facing today but we realise that building such environment is a long-term project that may not be within the capacity we have in Q4. As such we decided to:

  • propose a plan for enabling PR "platform" builds for FOLIO UI modules. John will put a more concrete plan in writing and share it for feedback before the Wednesday devops call. This has not been a successful undertaking last time but it is more promising now because a) we have the UI code better organised (platform-core and stripes-framework, stripes-cli tooling) b) we have decided to skip integration tests – which create additional complexity – and only do simple builds and backend dependency resolution. We want to keep this task as simple and small as possible so it can be rolled out quickly. Once rolled out it should allow us to allow for previews (for PO and testers) based on the PR builds, before things hit master branches and before they get deployed to shared environments. A process for rolling out a new major (breaking) version of a backend dependency needs to be accounted for here.
  • initiate the design work on the "clustered" continuous deployment environment. Wayne has agreed to capture the discussion and initial thoughts exchanged during the call. The document need to be flashed out to the point where we can wrap our heads around the scope of the work so we can try to estimate it.
Comment by John Malconian [ 23/Oct/18 ]

I do not think that there are any viable short-term options to address the issue of PO/dev collaboration for two reasons. 1. There will easily be resource allocation issues trying to deploy new backend dependencies that are not already deployed as part of the next-release (or snapshot build, for that matter). Frontend code that is being tested will inevitably want to rely on a new version of a backend module that is not part of the next-release build. 2. Mixing and matching dependencies between a next-release set of modules, the module being tested and any newly deployed backend modules that the frontend module would rely upon will result in frequent dependency resolution conflicts.

I can offer the following two additional PR quality gates in the short-term:

  • For ui-* modules, include the tested module in the platform that it belongs to (platform-core or platform-complete) and build the stripes bundle. A failed build will fail the PR. This is something we implemented several months ago and have since disabled. Previously we based the build on a 'snapshot'. This time around, however, the build would be based on the 'next-release' branch. Re-enabling will require reimplementing some pipeline code that was rolled back. We may also want to think about creating new stripes platforms for acquisitions and erm, etc.
  • Module dependency resolution checking utilizing a tenant install endpoint in Okapi (simulate-mode). The process would look something like the following:

1) deploy an instance of okapi (probably in a container) for each PR.
2) pull all module descriptors from folio-registry.
3) generate a module descriptor for the PR's module and post to local instance of okapi.
4) generate a list of stripes modules from 'next-release' branch of platform-core or platform-complete to enable in addition/in lieu of the local module we are testing.
5) create a tenant on local okapi instance
6) use tenant's install endpoint to simulate deployment
7) tear down local okapi instance

If there is a dependency resolution conflict, the PR fails. If there is no dependency resolution conflict, but the new version of the module is not included in list of modules to enable, a warning message is generated that signifies that no modules are prepared to use the new iteration of the module. This is useful for backend modules that increment the interface version.

Comment by Jakub Skoczen [ 24/Oct/18 ]

John Malconian I like the idea overall but I would like to clarify a couple of things:

  • when using "next-release" of platform-core or platform-complete, how would you generate the list of backend modules that need to be installed (simulated) in Okapi? Would it be based on the actual dependencies reported by the ui- modules (okapiInterfaces)? Or okapi-install.json?
  • as extensions to the process you propose, if simulation is succesfull, could we extend this with an optional step that tries to really "install" the backend modules on the Okapi instance? We are likely to hit the resource constraints for platform-complete, but if we focus on platform-core only it could potentially work, no?
Comment by John Malconian [ 24/Oct/18 ]

Jakub Skoczen On the first point, the list of backend modules would be based on okapiInterfaces. Maintaining a static list is cumbersome. I think one open question might be whether to filter on backend module "releases" only (exclude snapshot versions). I think that may be too limiting, however.

On the second point, note that I intend to implement a lightweight okapi instance - essentially fire up and tear down an okapi docker container as needed. Module deployment would overcomplicate the simplicity of this solution and I'm not sure I see much additional value in deploying the modules at this stage.

Comment by Heikki Levanto [ 24/Oct/18 ]

I haven't followed all this discussion, but I think we have put together a fairly decent process for releasing back end modules when they are ready to be used, and we should use released versions whenever possible. Running all random snapshots is a recipe for disaster.

Comment by Theodor Tolstoy (One-Group.se) [ 04/Jul/19 ]

This might need the NFR tag to prevent early implementers from ranking it.

Generated at Fri Feb 09 00:18:51 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.