DRAFT: MARC migrations: horizontal scaling support

Background

Authority records remapping was implemented for mod-marc-migrations according to the design MARC records migration (authority)

Currently vertical scaling is supported by increasing chunk size, chunks processing parallelism and resources for the module. In current implementation chunks data is prepared and read sequentially, only remapping/saving to file and related db queries are done in parallel. All files are uploaded to external storage when job ends.

For now - only one migration job could be running at the time, limited on purpose until MODMARCMIG-12 issue is solved.

Theoretically, if two app instances exist, they could process 2 jobs simultaneously, but only if load balancer routes second request to the second app instance.

Purpose

Support chunks processing distribution between app instances.

Proposed solution

Overview

Remote partitioning using Spring Batch Integration https://docs.spring.io/spring-batch/reference/spring-batch-integration/sub-elements.html#remote-partitioning with Kafka .

With such approach we will have batch job “manager“, which will construct chunks when job is submitted, then send chunk metadata to kafka so consumers (batch job “worker“) can read, process chunks, write/upload the file and return processing result metadata to kafka to later be consumed by “manager“ to complete the job.

Notes/pitfalls

There’s currently a GitHub issue https://github.com/spring-projects/spring-batch/issues/4133 connected to simultaneous running of multiple jobs. Issue reproduces, confirmed with POC. Probably we’ll not be affected if one app instance runs only one job.
If chunks are submitted to kafka - parallel chunks processing in a scope of one instance would require concurrent consuming.
“manager“/”worker” are supposed to be separate app instances, f.e. using profiles for configuration. Probably will be ok to have 1 manager + one worker for each app instance. TODO: check/test how to route responses in such case, or if it’s possible to consume on manager different from one which created the job.
In case of one app instance remote chunking will most likely be slower than current solution, so probably some profile/env variable should be present to enable remote chunking only in case there’re multiple app instances, otherwise - use currently implemented approach.

Alternative solution ideas, which require walking away from spring batch

Some scheduler to poll chunks that need processing, will require some locking on chunks.
Fire single kafka event about migration start and have it consumed by all app instances, will require some locking on chunks.
Fire chunk events in kafka. App instance responsible for firing these events will create a scheduled job created for each migration to monitor processing status. Or some db trigger could update operation status when chunks processed.
Probably will be possible to direct chunk processing requests directly to other app instances on Eureka platform.