Quartz Scheduling Implementation in mod-data-export-spring

Overview

The mod-data-export-spring module was originally implemented using the Spring scheduler. It's a working approach, however it has drawbacks due to being an in-memory scheduler:

  1. Issues with restarts - tenant’s schedules are loaded to service’s memory on module enable call for tenant. In case something happens with the container and it’s restarted/redeployed, all schedules stored in memory are lost and module enabling has to be called again for all tenants in order to reload the schedules. It needs to be done after any restart/redeploy, scheduled jobs won’t run until it’s done.
  2. Scalability - it’s not possible to have multiple running instances of the module. Each instance has schedulers which work independently and as the result jobs are triggered by each instance which results in jobs run duplication.

Solution

In order to solve the problems either custom solution with persistence layer for schedules needs to be implemented or some existing framework could be used. 

For our case it was decided to use Quartz scheduler http://www.quartz-scheduler.org/ . It’s an open-source scheduling library which supports running in clustered environment along with storing schedules in JDBC store and provides API which is convenient to use for our schedules.

Quartz clustering configuration: http://www.quartz-scheduler.org/documentation/quartz-2.3.0/configuration/ConfigJDBCJobStoreClustering.html

There's a separate schema with quartz tables shared for all tenants:

For easier identification of jobs they're stored with job_group containing tenantId and export type:

It can be used if some bulk operations with jobs need to be applied (for example, it's currently used to delete all schedules of specific tenant on module disable with purge operation)

Migration to quartz

With quartz all schedules are stored in the DB, so there's no need to reload the schedules from tenant's configurations on each restart. It needs to be done only once: on module upgrade from version which does not support quartz to version supporting quartz. This will be done automatically in scope of module enabling based on moduleTo/moduleFrom tenant attributes.

If for some reason reloading of schedules needs to be forced for tenant, it can be done with setting "forceSchedulesReload=true" in module enable request, ex:

$OKAPI_URL/_/proxy/tenants/diku/install?tenantParameters=forceSchedulesReload=true

[
	{
		"id": "mod-data-export-spring-3.0.0-SNAPSHOT",
		"action": "enable"
	}
]

Module disabling for tenant

Tenant's schedules deletion is done in scope of module disable for tenant, ex.:

$OKAPI_URL/_/proxy/tenants/_/proxy/tenants/diku/install?purge=true 
 [
	{
		"id": "mod-data-export-spring-3.0.0-SNAPSHOT",
		"action": "disable"
	}
]

NOTE: If disabling with purge is not invoked for mod-data-export-spring, tenant's scheduled jobs will continue attempts to run in the background even after tenant itself is deleted.

Testing

  • Regression testing (manual, karate, e2e tests)
  • Testing for NFR:
    1. ability to work without reenabling module after restarts
    2. multiple instances are supported
    3. test scheduling multiple jobs for same time is working - checked with scheduling of 1000 jobs for same time

Demo: mod-data-export-spring-quartz.mp4

  • Performance testing: TBD

Opened Questions

  • New module version release

One of the benefits of scheduling with quartz is that schedules are stored in the DB and it's not needed to reload them after restarts, scheduler starts to work automatically. This is desired behavior but there might be issues with deploying new releases in case there're some changes which need to be applied on module enable for tenant (like DB updates) - there'll be a period when jobs scheduling might work not as expected (either fail or process not as expected): from the container startup to module enable for the tenant. 

In case of such changes are done, the rollout issues need to be evaluated and  probably module will need additional code changes. One of the ideas how to deal with this is problem is to add versioning to quartz jobs. In this case on enabling module for tenant all it's quartz jobs can be updated with the new version and scheduler can have additional check for new version. If job version is not same as new module version, it can be just skipped. This would prevent  job execution before module is enabled for tenant and necessary updates are done. We decided not to implement this solution beforehand because it adds complexity, not need right now and we even don't know if it'll ever be a case.

NFR Score Card

https://ebscoinddev.atlassian.net/wiki/spaces/GSE/pages/221609985/UXPROD-3944+NFR+Scorecard