[FOLIO-2662] test automatic migration for Goldenrod Created: 29/Jun/20 Updated: 18/Aug/20 Resolved: 07/Aug/20 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P2 |
| Reporter: | Jakub Skoczen | Assignee: | jroot |
| Resolution: | Done | Votes: | 0 |
| Labels: | devops-backlog | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | DevOps: sprint 94 | ||||||||||||||||||||||||||||||||||||
| Development Team: | FOLIO DevOps | ||||||||||||||||||||||||||||||||||||
| Affected Institution: |
TAMU
|
||||||||||||||||||||||||||||||||||||
| Description |
|
Issue used to capture the upgrade testing results, and for linking back any bug tickets when they get created. |
| Comments |
| Comment by Jakub Skoczen [ 15/Jul/20 ] |
|
Asked Brandon Tharp if the migration could be retested early next week since Jason is away. |
| Comment by jroot [ 29/Jul/20 ] |
|
This is on-going. My DRAFT process for testing an upgrade is the following: 1.) Grab the install.json and okapi-install.json files from the desired Folio-org's platform-complete release branch: https://github.com/folio-org/platform-complete 2.) Git Clone the appropriate Libraries Folio release repo from here: <Private Git repo URL, alternative here: https://github.com/folio-org/folio-install/tree/kube-rancher/alternative-install/kubernetes-rancher/TAMU> 3.) Copy in the json files from the 1st step above to the /deploy-jobs/create-deploy/install folder. 4.) Update the install.json and okapi-install.json under /deploy-jobs/create-deploy-pubsub/install to include only the version of pubsub that is desired. 5.) Build and push two create-deploy Docker containers to VMWare Integrated Container registry: qX-202X-pubsub and qX-202X-test. Docker command examples below: docker build -t vic.library.tamu.edu/folio/create-deploy:q2-2020-test . docker push vic.library.tamu.edu/folio/create-deploy:q2-2020-test docker build -t vic.library.tamu.edu/folio/create-deploy:q2-2020-pubsub . docker push vic.library.tamu.edu/folio/create-deploy:q2-2020-pubsub 6.) In Rancher Dev, deploy the "create-upgrade-pubsub" K8s Job to the appropriate upgrade testing namespace first using the Diku/Tamu-tenant-config secret and tagged qX-202X-pubsub image mentioned above. For the Job Configuration settings set Completions, Parallelism and Back Off Limit to 1. 7.) If it succeeded, in Rancher Dev deploy the "create-upgrade-diku/tamu" K8s Job to the appropriate upgrade testing namespace second using the Diku/Tamu-tenant-config secret and the qX-202X-test image mentioned above. For the Job Configuration settings set Completions, Parallelism and Back Off Limit to 1. Set a long Active Deadline Seconds for the Job (I use 10000). 8.) If any of it fails, get the logs from Splunk Rancher index and/or from the containers themselves and record them. Folio Issue Jiras will need to be filed. To roll back: 1.) In Rancher Dev, spin down the Okapi container in the appropriate upgrade testing namespace, and spin down the Postgres Okapi database and Postgres modules database containers in their corresponding "postgres-modules/okapi" namespaces. 2.) Restore the Postgres data volumes in vSphere as they were snapshot the night before. You can find the volume names for the databases under the Volumes tab in Rancher Dev - Folio Project. 3.) In Rancher Dev, spin back up the two database containers to 1 pod each, then the Okapi container to 1 pod. Note: If any new modules need deploying, copy an existing older version of the Workload in Rancher Dev, and update the name and tag. You can also update the "workloads.yaml" provided in the appropriate Folio Git repo for Libraries under the YAML folder, and import it to the appropriate testing namespace in Rancher Dev. |
| Comment by jroot [ 29/Jul/20 ] |
|
Many Jira issues have been created in the course of testing, some being completed and successful: https://folio-org.atlassian.net/browse/MODORDSTOR-161 A majority of these have been closed after being tested successfully. There appear to still be issues with mod-pubsub, mod-feesfines and mod-circ. |
| Comment by jroot [ 31/Jul/20 ] |
|
During more testing and log tracing, it was determined that when upgrading a whole instance at once - these components/modules needed to be upgraded for the tenant first: okapi v3.x The reason is the new SYS permissions introduced by Okapi 3x not existing yet. Okapi does not seem to do this order of operations for you with that in mind, so it must be done purposefully. Incremental upgrades of modules for a tenant do not seem to trigger these same issues. Details on the errors are in the Comments section of these tickets: After having successfully upgraded two tenants with data, I am not able to login to either tenant after building the new Q2 front-end. Hitting these errors: From the Folio UI: Sorry, the information entered does not match our records. From mod-users pod log: INFO: loadDbSchema: Loaded templates/db_scripts/schema.json OK 21:31:34 INFO CQLWrapper CQL >>> SQL: username==tamu_admin >>>WHERE lower(f_unaccent(users.jsonb->>'username')) LIKE lower(f_unaccent('tamu\_admin')) LIMIT 10 OFFSET 0 Jul 30, 2020 9:31:34 PM org.folio.cql2pgjson.CQL2PgJSON loadDbSchema INFO: loadDbSchema: Loaded templates/db_scripts/schema.json OK 21:31:34 INFO CQLWrapper CQL >>> SQL: username==tamu_admin >>>WHERE lower(f_unaccent(users.jsonb->>'username')) LIKE lower(f_unaccent('tamu\_admin')) LIMIT 10 OFFSET 0 21:31:34 ERROR PgUtil current transaction is aborted, commands ignored until end of transaction block io.vertx.pgclient.PgException: current transaction is aborted, commands ignored until end of transaction block at io.vertx.pgclient.impl.codec.ErrorResponse.toException(ErrorResponse.java:29) ~[mod-users-fat.jar:?] at io.vertx.pgclient.impl.codec.QueryCommandBaseCodec.handleErrorResponse(QueryCommandBaseCodec.java:57) ~[mod-users-fat.jar:?] at io.vertx.pgclient.impl.codec.PgDecoder.decodeError(PgDecoder.java:233) ~[mod-users-fat.jar:?] at io.vertx.pgclient.impl.codec.PgDecoder.decodeMessage(PgDecoder.java:122) [mod-users-fat.jar:?] at io.vertx.pgclient.impl.codec.PgDecoder.channelRead(PgDecoder.java:102) [mod-users-fat.jar:?] at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [mod-users-fat.jar:?] at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [mod-users-fat.jar:?] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [mod-users-fat.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [mod-users-fat.jar:?] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [mod-users-fat.jar:?] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [mod-users-fat.jar:?] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [mod-users-fat.jar:?] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [mod-users-fat.jar:?] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) [mod-users-fat.jar:?] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [mod-users-fat.jar:?] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [mod-users-fat.jar:?] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [mod-users-fat.jar:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [mod-users-fat.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] Jul 30, 2020 9:31:37 PM org.folio.cql2pgjson.CQL2PgJSON loadDbSchema Before the upgrade, login capabilitty was tested and successful. |
| Comment by jroot [ 03/Aug/20 ] |
|
It was noted that doing an Okapi upgrade for the instance, the newer Okapi module version would get enabled for the supertenant automatically but not for the institutional tenants. Enabling the newer Okapi module for the tenant had no effect on a successful Q2 tenant upgrade or not. The upgrade still reports as successful, but I still cannot log in... There is a bug filed for mod-users for this (https://folio-org.atlassian.net/browse/MODUSERS-213). |
| Comment by Wayne Schneider [ 07/Aug/20 ] |
|
Here are some notes on Index Data's experience with Goldenrod upgrades, mostly confirming jroot's notes above: Our experience attempting to upgrade from Fameflower to Goldenrod in place is rather mixed. There seem to be some issues with the order in which you proceed, due to the Okapi major version upgrade, the minor version upgrades to mod-permissions and mod-authtoken that are required to support module permissions, and changes to mod-pubsub. The safest high-level upgrade order seems to be:
Detailed procedure
To resolve, create a permissionsUser record with the appropriate permissions:
{
"userId": "<UUID of pub-sub user>",
"permissions": [
"source-storage.events.post",
"source-records-manager.events.post",
"inventory.events.post",
"circulation.events.post",
"patron-blocks.events.post",
]
}
Finally, post the full upgrade install.json file from the platform-complete q2-2020 branch with a command like: curl -w '\n' -D - -X POST -d @install.json -H "X-Okapi-Token: <tokenValue>" http://okapi:9130/_/proxy/tenants/<tenantId>/install. Note that this request can take anywhere from several minutes to several hours to return, depending on the size of the tenant dataset, so plan accordingly |