[RRT] Jobs completing with errors

RCA Group

Not a bug

Description

Jobs completing with various errors:

  1. org.folio.processing.exceptions.EventProcessingException: Failed to handle event payload, cause event payload context does not contain MARC_BIBLIOGRAPHIC or INSTANCE data - fixed

  2. io.vertx.core.impl.NoStackTraceThrowable: Failed to retrieve MARC record by instance id: '2475a3a8-49c7-401d-ac9e-cd30a763a964', status code: 404 - fixed by MODINV-847

  3. in00000001356 io.vertx.core.impl.NoStackTraceThrowable: Current retry number 1 exceeded or equal given number 1 for the Instance update for jobExecutionId 'c03084fc-025c-42d3-b2c4-a38313c87c80'
    Job execution id = d7138f09-c66c-46a8-98e5-6dfb4278dacc - looks like an Optimistic Locking error, but might be an issue with misconfiguration - investigating further in scope of

Environment

None

Potential Workaround

None

Checklist

hide

TestRail: Results

Activity

Show:

Ann-Marie BreauxNovember 6, 2023 at 5:04 PM

Thank you for all the details, . Closing this issue

Kateryna SenchenkoNovember 6, 2023 at 1:34 PM

Results of the analysis in the current thread:

  1. org.folio.processing.exceptions.EventProcessingException: Failed to handle event payload, cause event payload context does not contain MARC_BIBLIOGRAPHIC or INSTANCE data - appears to be fixed, I couldn’t reproduce the issue, the scenario is the same - Instance matched, but Holdings not matched, therefore Update Instance and Create Holdings - and it works as expected. There were issues with Holdings sub-matches and create actions in non-match branches that were resolved either as Orchid CSP #3 or in Poppy (could be a side-effect of fixes MODSOURCE-662 and )

  2. io.vertx.core.impl.NoStackTraceThrowable: Failed to retrieve MARC record by instance id: '2475a3a8-49c7-401d-ac9e-cd30a763a964', status code: 404 - fixed in Poppy

  3. OL exceptions - currently working with PTF to understand better

There are couple of possible reasons:

  • Different jobs that run in parallel and try to update the same record at the same time, or a user tries to update the same record that is being updated by the DI job at the same moment. However, such clashes should be extremely rare and covered by the retry mechanism that will get the updated version of record and try the operation again.

  • Misconfiguration - modules are running in different Kafka consumer groups - we observed a precedent, but it is very unlikely (Carol confirmed it is not the case)

  • Kafka can duplicated the events - de-duplication mechanism exists only for create actions, if update event is duplicated it will be processed. Such cases should also be covered by the retry mechanism on OL and as a result record should be processed successfully. If retry number is exceeded - something is out of ordinary happening, need to investigate more

  • mod-inventory/mod-source-record-storage were restarted and consumed messages that were already processed. Also should be handled by the retry.

So far these are all our thoughts on the topic. More information will be gathered and analyzed in scope of

Carole GodfreySeptember 6, 2023 at 9:11 PM

Noting – issue number 3 is not due to a misconfiguration

Done

Details

Assignee

Reporter

Priority

Story Points

Sprint

Development Team

Folijet

Release

Poppy (R2 2023) Bug Fix

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created September 1, 2023 at 10:06 AM
Updated January 4, 2024 at 2:32 PM
Resolved November 6, 2023 at 5:05 PM
TestRail: Cases
TestRail: Runs