Problem statement
Current FOLIO CI infrastructure and reference environments have several limitations that make it hard to scale them the point where a fully continuos and isolated deployment of development builds (e.g from PRs and feature branches, for both UI and backend modules) is possible. Those limitations include:
- the reference envs (e.g snapshot) are recreated in full nightly, in a top-down approach (starting from a platform and resolving its dependencies which are deployed one by one).
- the reference envs are primarily single-tenant (the so-called 'diku' demo tenant)
- the reference envs are constructed from artefacts based on shared branches (master) only and all those artefacts are shared at runtime by the single demo tenant
This is makes it impossible to implement certain development process capabilities (like the PR previews, see
FOLIO-1993
Closed
) without a list of fairly severe limitations (see description of the PR preview PoC for a comprehensive list of those limitations at https://dev.folio.org/guides/branch-preview/)
Proposed approach
We would like to revamp the deployment and orchestration infrastructure, create a new, clustered and multi-tenant, reference environment for development and integration purposes, and update the CI processes (Jenkins, Ansible, etc) to allow for a more continuos and isolated deployment of development artefacts.
Orchestration
Based on the prior work from from Jason Root (TAMU) and Mark Stacy (Core: platform/Colorado, see
FOLIO-1408
Closed
) who have experimented with various orchestration tools and we have concluded that Kubernetes (K8s) has become the defacto standard for container orchestration and it is the orchestration tool of choice across major cloud vendors. It seems very likely that many organisations will use K8s for production-ready FOLIO deployments. K8s also brings many benefits for development deployments: a rich ecosystem of tools that ease provisioning of dependencies (e.g Helm) and widely accepted practices and processes for deployment of multiple development builds.
The Core: platform has undertaken a focused effort to ease K8s integration across FOLIO platform. This effort is tracked on
UXPROD-1823
Draft
.
Clustered reference environment
This work is being done outside of the Platform team, see
FOLIO-2053
Closed
CI process
We would like to extend the CI process in a way which allows to:
- deploy a backend container immediately after a successful automatic build (Jenkins/CI) on the clustered reference environment. This would include both snapshot and release builds and could be scaled up to allow for a deployment of feature branch builds (at specific points in the PR lifecycle). The deployed container should be registered with Okapi running on the clustered reference environment in order to allow for it being used as a dependency in appropriate tenant configurations.
- create an independent tenant, on the clustered reference environment, for each platform- and ui- module build (of all kinds, including release, snapshot, and feature branch) to allow for running the particular build in isolation (including sample data set isolation). In the case of ui- modules, which are not self-contained units, the build process must be able to embed the module in an appropriate platform to form a complete Stripes bundle. Such bundle should be exposed to users after a succesful build. Okapi remains responsible for providing backend service dependencies in the created tenant.
See
FOLIO-2054
Closed
,
FOLIO-2055
Closed
and
FOLIO-2056
Closed
Further read
PR previews:
UXPROD-1817
Closed
Kubernetes integration:
UXPROD-1823
Draft
FOLIO RM and CI concepts: https://docs.google.com/document/d/1au2hG4gPekyZ_HxAU7s6sc4NOvRaVovUcBnMLa1iR7E/edit?usp=sharing
|
|
One way this might work:
- set up a PostgreSQL instance to back the FOLIO system
- set up an nginx server to serve bundles and proxy Okapi for each tenant
- create an Okapi cluster. A new release of Okapi triggers a rolling upgrade of the cluster.
- create one or more "reference tenants" on the Okapi cluster. These tenants might represent the "snapshot" FOLIO build using prerelease artifacts, "snapshot-stable" using prerelease artifacts that pass regression tests, and "release" using only released artifacts (to use our current structure – could refactor this).
- each new release of a backend module (including snapshots) spins up a Docker container accessible at a URL based on module id (e.g. mod-authtoken-1_5_2-SNAPSHOT_52.aws.indexdata.com:8081). The db connection string is included as environment variables for the container (or as a config file in the container). A module descriptor and deployment descriptor are posted to the Okapi cluster.
- A snapshot release of a backend module triggers an attempt to upgrade the module for the "snapshot" tenant. For storage modules, upgrade scripts must be in place if the data structure or database schema changes (e.g. due to data structure changes or indexes).
- Each new release of a frontend module (including snapshots) causes a module descriptor to be posted to the Okapi cluster
- Each new release of a frontend module triggers an attempt to upgrade the module for the "snapshot" tenant
- The module is upgraded in Okapi to pick up any new permissions
- A new webpack is built for the tenant that includes the updated module
This is just a first pass at how we might do it. Other thoughts?
|
- We need a folio system where dev teams and product owners can collaborate before changes are ready to be committed to a master branch
- This system should be based on the “next-release” set of modules. It will be dev team responsibility to push the latest versions of modules that are needed for collaboration, testing and PO acceptance.
- A multi-tenant Folio system will be accessed by multiple development teams.
- Each team should be able to manage/deploy a set of modules without affecting other teams.
- Dev team should be able to have access to more than one tenant to allow multiple teams to collaborate on the same project
- Dev team should NOT have access to AWS infrastructure. No AIM accounts will be provisioned for devs.
- A dev team should be able to deploy module from a branch (not master) to a tenant.
- Both UI and backend module deployment should be automated.
- Location of a system should be considered as we can have devs and PO spread over 9 time zones.
- The process of requesting/provisioning of a tenant should be automated
- All operations should be done from CLI interface
- Automated deployment of a module process should be driven by tenant credentials.
- Automated deployment of a module should be fast and allow multiple iterations per working day.
- Developers should be able to request a tenant provisioning via CLI
- Tenants should be automatically removed in no modules were deployed for ?? days
- Developers cannot expect prolonged continuity of a tenant ( no more than ??) because the system will be rebuild frequently.
- Complete teardown/rebuild will happen if a new “next-release” version of Folio is available
- It should be easy for developers to identify that the system has been refreshed and they need to redeploy their modules.
- Initially, the system will not handle schema migrations
|
|
One thing that Anya and I discussed a couple days ago.
For PO/Tester testing of newly-fixed bugs and new functionality (story reviews), the closer to the developer, the better. Having a place that we can test where the developer knows the new code has been released, and that we (think) will work in the context of the rest of FOLIO is good. Per conversations with Anton, we should ideally catch these types of problems at folio-testing at the latest, preferable sooner than that.
In terms of exploratory testing and usability, it seems like that should happen on as stable and refined version of FOLIO as possible - basically the public face of FOLIO that we want the community to see, and that we expect to have the main integration problems stabilized and main bugs already identified and removed. It seems like exploratory testing should happen at a point like snapshot-stable, later in the process than the story review testing.
|
|
Ann-Marie Breaux, You're exactly right. These two use cases are requiring 2 different environments and we'/re missing the first one.
|
|
If there is a need to preview functionality and test code before committing to master, the Pull Request stage would seem like a logical choice to build and deploy preview artifacts. The process would be something like:
1. submit PR
2. build and deploy PR artifacts.
3. provision unique tenant id for PR.
4. enable PR artifacts alongside/in lieu of next-release artifacts for tenant
5a. if PO and dev are satisfied with preview and PR passes whatever automated tests are configured to run for that repo, then the PR is merged to master and the tenant is automatically deleted. A proper release of that artifact can then be cut and added to next-release.
5b. If preview mode of artifact is not satisfactory, the PR can either be closed OR the dev can commit updates to the branch that opened the PR and a new tenant is enabled with revised artifacts and reviewed.
|
|
John Malconian Precisely, I think the approach of "PR previews" would give us the most flexibility in terms of isolated and unobstructed PO-Developer communication. In general terms the flow of work could be something like:
1. Developer issues PR
2. Automatic checks are run (lint, SQ, etc)
3. PR preview is deployed
4. PO reviews and accepts feature
5. PR is merged to master
6. Feature is deployed on the next nightly build (folio-snapshot and after passing aut int test folio-snapshot-stable) along with all other accepted features
7. Feature is scheduled for the next release and when released it becomes available on the releases environment (folio-release)
John Malconian How complex is it to build "PR preview" functionality?
|
|
Adam Dickmeiss John Malconian Wayne Schneider Anton Emelianov Matthew Jones Zak Burke thanks for your time and availability today.
We had a long meeting about this and related issues today. There's been a lot of discussion about how a new type of "clustered" continuous deployment environment could help us address some of the issues we are facing today but we realise that building such environment is a long-term project that may not be within the capacity we have in Q4. As such we decided to:
- propose a plan for enabling PR "platform" builds for FOLIO UI modules. John will put a more concrete plan in writing and share it for feedback before the Wednesday devops call. This has not been a successful undertaking last time but it is more promising now because a) we have the UI code better organised (platform-core and stripes-framework, stripes-cli tooling) b) we have decided to skip integration tests – which create additional complexity – and only do simple builds and backend dependency resolution. We want to keep this task as simple and small as possible so it can be rolled out quickly. Once rolled out it should allow us to allow for previews (for PO and testers) based on the PR builds, before things hit master branches and before they get deployed to shared environments. A process for rolling out a new major (breaking) version of a backend dependency needs to be accounted for here.
- initiate the design work on the "clustered" continuous deployment environment. Wayne has agreed to capture the discussion and initial thoughts exchanged during the call. The document need to be flashed out to the point where we can wrap our heads around the scope of the work so we can try to estimate it.
|
|
I do not think that there are any viable short-term options to address the issue of PO/dev collaboration for two reasons. 1. There will easily be resource allocation issues trying to deploy new backend dependencies that are not already deployed as part of the next-release (or snapshot build, for that matter). Frontend code that is being tested will inevitably want to rely on a new version of a backend module that is not part of the next-release build. 2. Mixing and matching dependencies between a next-release set of modules, the module being tested and any newly deployed backend modules that the frontend module would rely upon will result in frequent dependency resolution conflicts.
I can offer the following two additional PR quality gates in the short-term:
- For ui-* modules, include the tested module in the platform that it belongs to (platform-core or platform-complete) and build the stripes bundle. A failed build will fail the PR. This is something we implemented several months ago and have since disabled. Previously we based the build on a 'snapshot'. This time around, however, the build would be based on the 'next-release' branch. Re-enabling will require reimplementing some pipeline code that was rolled back. We may also want to think about creating new stripes platforms for acquisitions and erm, etc.
- Module dependency resolution checking utilizing a tenant install endpoint in Okapi (simulate-mode). The process would look something like the following:
1) deploy an instance of okapi (probably in a container) for each PR.
2) pull all module descriptors from folio-registry.
3) generate a module descriptor for the PR's module and post to local instance of okapi.
4) generate a list of stripes modules from 'next-release' branch of platform-core or platform-complete to enable in addition/in lieu of the local module we are testing.
5) create a tenant on local okapi instance
6) use tenant's install endpoint to simulate deployment
7) tear down local okapi instance
If there is a dependency resolution conflict, the PR fails. If there is no dependency resolution conflict, but the new version of the module is not included in list of modules to enable, a warning message is generated that signifies that no modules are prepared to use the new iteration of the module. This is useful for backend modules that increment the interface version.
|
|
John Malconian I like the idea overall but I would like to clarify a couple of things:
- when using "next-release" of platform-core or platform-complete, how would you generate the list of backend modules that need to be installed (simulated) in Okapi? Would it be based on the actual dependencies reported by the ui- modules (okapiInterfaces)? Or okapi-install.json?
- as extensions to the process you propose, if simulation is succesfull, could we extend this with an optional step that tries to really "install" the backend modules on the Okapi instance? We are likely to hit the resource constraints for platform-complete, but if we focus on platform-core only it could potentially work, no?
|
|
Jakub Skoczen On the first point, the list of backend modules would be based on okapiInterfaces. Maintaining a static list is cumbersome. I think one open question might be whether to filter on backend module "releases" only (exclude snapshot versions). I think that may be too limiting, however.
On the second point, note that I intend to implement a lightweight okapi instance - essentially fire up and tear down an okapi docker container as needed. Module deployment would overcomplicate the simplicity of this solution and I'm not sure I see much additional value in deploying the modules at this stage.
|
|
I haven't followed all this discussion, but I think we have put together a fairly decent process for releasing back end modules when they are ready to be used, and we should use released versions whenever possible. Running all random snapshots is a recipe for disaster.
|
|
This might need the NFR tag to prevent early implementers from ranking it.
|
Generated at Thu Feb 08 23:14:10 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.