[FOLIO-2336] continuous folio-snapshot reference env Created: 04/Nov/19  Updated: 03/Jun/20  Resolved: 10/Feb/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Jakub Skoczen Assignee: Ian Hardy
Resolution: Done Votes: 0
Labels: platform-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
is blocked by FOLIO-2338 SPIKE: prepare a design for continuou... Closed
is blocked by MODINVSTOR-397 TenantLoading fails on 2nd pass (not ... Closed
is blocked by MODTEMPENG-37 mod-template-engine errors out on upd... Closed
Relates
relates to RMB-527 Integration test for RMB upgrade Open
Sprint: CP: sprint 77, CP: sprint 78, CP: sprint 79, CP: sprint 80/81
Story Points: 3
Development Team: Core: Platform

 Description   

Task

Set up a new environment called 'folio-snapshot-continous' that will operate on the new K8s FOLIO infrastructure. The environment should operate in a fashion similar to existing 'folio-snapshot' with the following differences:

  • the environment will be updated immediately when new a new backend snapshot container is built and/or when a new UI snapshot artifact is built
  • the environment is updated incrementally, new backend modules are installed using the upgrade method


 Comments   
Comment by Jakub Skoczen [ 04/Nov/19 ]

Ian Hardy Wayne Schneider John Malconian Guys, can you please help me out flashing out this issue? I'd like us to capture the details of how this env would work – including what events would trigger what action (commit/PRs to individual mod- and ui- repos, commits/PRs to platform-complete, etc) and how is the upgrade of individual modules performed (e.g Okapi upgrade vs install endpoint, etc).

Comment by Ian Hardy [ 04/Nov/19 ]

Here's some thoughts to get started, interested to hear what others think as well.

Since snapshot is a top-down build from the platform I think we'd have a tenant for snapshot, and then start an update by building platform-complete/core and then post the resulting list of modules to the install endpoint to trigger an update on the snapshot tenant. We'd probably want to make sure the stripes builds take place in the kubernetes cluster since building stripes is fairly expensive.

Ideally we'd do this for a commit to an individual module (either ui or mod) since that would be really continuous, but I can see backing off that if there are any problems from overlapping updates--would be interesting to try.

Some more open questions:

  • when does okapi get upgraded?
  • how long is data persisted?
Comment by Ian Hardy [ 12/Nov/19 ]

Since migrations aren't implemented by the modules, we could start by discarding the tenant and rebuilding the tenant every few hours. Jakub Skoczen Are you OK with proceeding this way or would you prefer to hold off until upgrades are available?

Comment by Jakub Skoczen [ 18/Nov/19 ]

Ian Hardy I think we should not discard the tenant but upgrade the modules within the tenant. This is an important differentiating factor from the existing snapshot environment.

The assumption is that even if some modules do not provide migrations during "init", they will provide compatible sample and reference data that will overlay over the existing data.

Comment by Ian Hardy [ 21/Nov/19 ]

Increased nginx ingress controler's proxy-body-size and proxy-read-timeout. Was seeing a 504 from nginx on the call to /_/proxy/tenant/{}/install.

List of available nginx annotations for configuration: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-timeouts
Used this approach for updating the controller: https://stackoverflow.com/questions/49918313/413-error-with-kubernetes-and-nginx-ingress-controller

Also needed to update the timeout for the AWS ALB instnance. Increased to 300s on the AWS console.

Comment by Ian Hardy [ 22/Nov/19 ]

Frontend: https://snapshot-core.s3.amazonaws.com/index.html
snapshot_core_admin/admin

Preserving the tenant here as Jakub Skoczen recommends and upgrading modules.

A couple notes to myself on problems that came up:

  • timeouts set too short on various proxies as described above
  • broken tenant or insufficiently cleaned up tenant leaves roles in rds. Modules will fail during init if the role they're trying to create already exists
  • need to decide what action if any to take if integration tests pass/fail
Comment by Ian Hardy [ 25/Nov/19 ]

Looks like loading sample/reference data is not idempotent, below are logs from mod inventory storage on tenant init (trying to upgrade to latest snapshot). Right now loadsample/reference are called for all modules at once when the module list is posted to `/_/proxy/tenants/

{tenant}

/install`

One approach might be to get a list of currently enabled modules before posting the list to install, and then post required modules one at a time and set the data parameters depending on whether the module is present already. Another might be to insist that the data loading scripts are idempotent. Open to other suggestions.

25 Nov 2019 13:37:14:061 ERROR PostgreSQLConnection$failQueryPromise$1 reqId=276355/proxy;361332/tenant Setting error on future java.util.concurrent.CompletableFuture@4bf04b04[Not completed, 1 dependents]

25 Nov 2019 13:37:14:062 ERROR PostgreSQLConnection Error , message -> ErrorMessage(fields=[(Severity, ERROR), (V, ERROR), (SQLSTATE, 23505), (Message, duplicate key value violates unique constraint "instance_pkey"), (Detail, Key (id)=(8be05cf5-fb4f-4752-8094-8e179d08fb99) already exists.), (s, snapshot_core_mod_inventory_storage), (t, instance), (n, instance_pkey), (File, nbtinsert.c), (Line, 534), (Routine, _bt_check_unique)])

25 Nov 2019 13:37:14:062 INFO LogUtil [46529247eqId] org.folio.rest.RestVerticle start invoking postInstanceStorageInstances

25 Nov 2019 13:37:14:062 ERROR PostgreSQLConnection Error on connection

com.github.jasync.sql.db.postgresql.exceptions.GenericDatabaseException: ErrorMessage(fields=[(Severity, ERROR), (V, ERROR), (SQLSTATE, 23505), (Message, duplicate key value violates unique constraint "instance_pkey"), (Detail, Key (id)=(8be05cf5-fb4f-4752-8094-8e179d08fb99) already exists.), (s, snapshot_core_mod_inventory_storage), (t, instance), (n, instance_pkey), (File, nbtinsert.c), (Line, 534), (Routine, _bt_check_unique)])
Comment by Ian Hardy [ 12/Dec/19 ]

Reported probelms with mod-inventory, mod-template engine and then deleted previous tenant for a clean start.

New tenant is available here again:
Frontend: https://snapshot-core.s3.amazonaws.com/index.html
snapshot_core_admin/admin

Need to do some more rebuilds/updates when fresh snapshots are published to verify.

Comment by Ian Hardy [ 04/Feb/20 ]

todo:

  • verify that parameter to turn off loadSample is working properly (seeing some duplicates)
  • do call loadReference every time (Marc and Jakub suggest this should be idempotent)
Generated at Thu Feb 08 23:19:55 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.