Async install
DRAFT - This is a work in progress
Overview
Module initialization/upgrade may take a long time and thus, make the install/upgrade service take a long time - Multiple hours in some cases. This document captures thoughts related to an investigation of whether to make the install/upgrade asynchronous. Also incorporated in this is the notion of being able to "continue on failure", and get a complete list of failures across all modules being installed/upgraded.
See - OKAPI-804Getting issue details... STATUS and - OKAPI-845Getting issue details... STATUS
Scope
The purpose of the async install is to address the problem of long tenant init operations. While Okapi's HTTP client has no timeout for that operation, there are clients using Okapi and/or gateways between Okapi and a backend module which causes trouble and might give up on such long operations. The async install will only deal with the former problem. The latter problem will have to make each module work in an async fashion or use some other means of transport.
Persistence
To address this, and also to be able to monitor the progress of an install which may be calling tenant for multiple modules, it would be good to "persist" the operation.
Backwards Compatibility
The install and the upgrade (/_/proxy/tenants/<tenant>/install and /_/proxy/tenants/<tenant>/upgrade) .. can be kept as they are with no changes to existing behavior. They are currently using POST to perform the operation. We'd like to do use the same method to initiate the async operation. It could be as simple as using a flag "async=true" as query parameter. The RAML definition will be "identical" for non-async and sync mode.. But of course what follows is "async" only.
High Level Approach
- If async=true, install/upgrade will return a Location and the status of the install can be inspected with GET on the returned location. It could be extended with DELETE to signal "abort" and remove info about install/upgrade operation.
- When install is working, it will persist. It should also persist after it's done - as far as calling module's tenant init. It could persist as long as Okapi is running (for the cluster).. and be removed when Okapi/cluster is removed.
- To list all operations, since the cluster was started... it would be possible to use GET on the install/upgrade path... same as POST to initiate. Likewise DELETE on that path would remove info for all operations.
- It does not seem necessary to persist this to a database.. But it could be done.. And so list all install/upgrade operations...
- Split tenant init into multiple phases
- preInit - changes isolated to this module: schema creation/updates and data migration scripts run during this phase.
- commit - commit the changes made in preInit
- postInit - intended for more business logic changes, loading reference/sample data, etc. Here calling other modules is allowed. if this is being invoked, all dependencies should already be satisfied.
- Expand the /_/tenant API to accommodate the new mulit-phased interface
- POST /_/tenant/preInst
- POST /_/tenant/commit
- POST /_/tenant/postInst
- POST /_/tenant/abort
Multi-phase Tenant Initialization
preInit
OKAPI invokes preInit for each module being installed/upgraded in parallel. Since preInit only involves changes isolated to the module, there's no need to account for dependencies here. This allows us to run these in parallel. The ability to rollback changes applied here must be supported, e.g. via DB transactions, copy-on-write, etc.
commit
Details of how this works are TBD and depend on the remediation approach taken in preInit.
abort
Instead of committing the preInit changes, here we're rolling them back.
postInit
OKAPI invokes postInit for each module being installed/upgraded in a bottom-up fashion. So if module A depends on module B, A.postInit will be in "Pending" state until module B.postInit is "Done". If module B fails, both module A and B will have status "Failed" along with some message/context.
Schemas
install_progress
What OKAPI returns when retrieving the status/progress of an asynchronous install/upgrade.
TBD - Could be an extension of the TenantModuleDescriptor.json format with an addition of a status. Another option is to encapsulate this in an object that also has an overallStatus field.
Status Enumeration
- Pending
- PreInit
- Committed
- Aborted
- PostInit
- Done
- Failed
APIs
Interface | Method | Path | Request | Response | Description | Notes |
---|---|---|---|---|---|---|
okapi | POST | /_/proxy/tenant/<id>/install?async=true | Start an asynchronous install | behavior stays the same if async=true is not provided | ||
okapi | GET | /_/proxy/tenant/<id>/install | Get the status of an install | |||
okapi | DELETE | /_/proxy/tenant/<id>/install | Abort an install | |||
okapi | POST | /_/proxy/tenant/<id>/upgradel?async=true | Start an asynchronous install | |||
okapi | GET | /_/proxy/tenant/<id>/upgrade | Get the status of an install | |||
okapi | DELETE | /_/proxy/tenant/<id>/upgrade | Abort an install | |||
tenant | POST | /_/tenant/pre | ||||
tenant | POST | /_/tenant/commit | ||||
tenant | POST | /_/tenant/abort | ||||
tenant | POST | /_/tenant/post |