MODINREACH-347 - Record Contribution Errors Should not Cause Contribution Jobs to Halt

Submitted

 

Approved
Status

ACCEPTED

Impact

LOW


MODINREACH-347 - Getting issue details... STATUS , repo mod-inn-reach

Solution Proposal and Recommendation

Concept is to proceed with solution of:

  1. Pause and resume contribution jobs (initial and ongoing)
  2. Introduce component / mechanism that will manage retries (with intervals, maxAmount,  other params) to the Central server when jobs are paused

Option Space


Option AOption BOption C
TitleExtend KafkaConsumer using its pause and resume methodExtend springframework.kafka ConcurrentMessageListenerContainerSpringBatch
Description

Extending KafkaConsumer



Extending springframework.kafka We’re already using org.springframework.kafka but not taking advantage of its concurrent features and not taking advantage of features that allow for pausing and starting consumers.

Our implementation of InitialContributionJobMessageConsumer could contain a single reference to the ConcurrentMessageListenerContainer. Our message processing code could easily be injected into the implementation of an AcknowledgingMessageListener.

When a message cannot be committed because the central server is unavailable, we could pause the ConcurrentMessageListenerContainer, and schedule a retry using a ConcurrentTaskScheduler. · Javadoc for ConcurrentMessageListenerContainer

·Example code using ConcurrentMessageListenerContainer to read messages form kafka with pause and restart: ·

Medium post describing the above repo

Pausing and resuming using CurrenMessageListenerContainer

Using Spring Batch. Spring Batch does much more than read messages from kafka, but it can easily be adapted to perform that task. See the notes for an example for how to read from kafka. The reason we should strongly consider Spring Batch is that unlike extending springframework.kafka we get job persistence for free in the form of a JobRepository. When using the @EnableBatchProcessing annotation a JobRepository is provided for you. Spring batch has features to support our other requirements, like parallel processing, and pausing and restaring job execution based on business requirements. Also, the idea of restarting based on other failures is baked into Spring Batch.

Useful links:

Pros (Benefit/Effort reduction)
  • This might seem like the simplest approach. The KafkaConsumer has pause and resume methods. 
  •  The ConcurrentMessageListenerContainer allows for easy instantiation of multiple concurrent consumers and it allows for easy pause and resume of consumers with a single call to the consumer container.
  •  job persistence for free in the form of a JobRepository.
  • Spring batch has features to support other requirements, like parallel processing,
  •  restarting based on other failures is baked into Spring Batch.
Cons (Costs/Risks)
  •  it doesn’t have two other requirements which are to save the job and support some form of concurrency
  • Note also that KafkaConsumer is not thread safe so using it in concurrent scenarios may not be feasible.
  • Implies more devs efforts than in Option A as some codebase rearrangements required
  • Note that Spring Batch uses a KafkaItemReader which is based on KafkaConsumer, so it is also not thread safe, so implementing a concurrent scenario with Spring Batch may not be easily done
Impact (processes, data, system, timeline)


Recommendation

From the listed options, Option B looks as the most appropriate that addresses requirements for jobs resilience and introduces abilities to performance improvements for further Quality Improving Process in the scope of contributions jobs.

Decision taken

  As results of meetings done today with Steve Ellis, Gurleen Kaur1 currently it's decided to proceed with Option B - Extend springframework.kafka ConcurrentMessageListenerContainer as the first part of the solution. Still to be defined retrying mechanism. Also, impact to be defined.