Close Transactions After Exceptions Are Thrown (Orchid CSP 3 Clone)

Description

There is an issue with lingering transactions when exceptions occur within lambda on some code paths. This is caused by a bug captured by the offending library used in SRS, https://github.com/jklingsporn/vertx-jooq/issues/197. 

A symptom of this issue is that database connections are in a "idle in transaction" state for some time after an exception has occurred. Transactions are open but no SQL statements are sent through. The database is left waiting for any interaction before the connection is closed. Top suspect is that the connection is closed when the query executor, which holds a reference to the transaction object, is garbage collected. This also means that for the duration of the connection being held hostage other processes within SRS can't use the connection. In a Data Import job with lots of errors, the total duration of the job could increase considerably.

Acceptance Criteria

  • Upgrade vertx-jooq-classic-reactive to at least version 6.4.1

ORCHID Critical service patch details

  1. Describe issue impact on business: Data import needs less time connected to the SRS database. During investigation, a bug was found by a library used in SRS, and this change updates the library to a newer version where the bug has been fixed.

  2. What institutions are affected? (field “Affected Institutions” in Jira to be populated): All who use SRS or Data import

  3. What is the workaround if exists? None, jobs just continue to be slow

  4. What areas will be impacted by fix (i.e. what areas need to be retested): Confirm Data import Smoke and Critical path work as expected

  5. Brief explanation of technical implementation and the level of effort (in workdays) and technical risk (low/medium/high):
    Purpose
    There is an issue with lingering transactions when exceptions occur within lambda on some code paths. This is caused by a bug captured by the offending library used in SRS, jklingsporn/vertx-jooq#197.
    A symptom of this issue is that database connections are in a "idle in transaction" state for some time after an exception has occurred. Transactions are open but no SQL statements are sent through. The database is left waiting for any interaction before the connection is closed. Top suspect is that the connection is closed when the query executor, which holds a reference to the transaction object, is garbage collected. This also means that for the duration of the connection being held hostage other processes within SRS can't use the connection. In a Data Import job with lots of errors, the total duration of the job could increase considerably.
    Approach
    Upgrade vertx-jooq-classic-reactive to at least version 6.5.5
    Technical risk: Low

  6. Brief explanation of testing required and level of effort (in workdays). Provide test plan agreed with by QA Manager and PO: After the MODSOURCE and MODSOURMAN patches are applied, we need to retest the Smoke and Critical Path Data Import tests (most of which are automated), and perhaps selected Extended Manual tests. Manual testing across these MODSOURCE and MODSOURMAN changes are likely 3-5 days of work for manual QA, plus some input from PO.

  7. What is the roll back plan in case the fix does not work? Revert to previous version

NOLANA Critical service patch details

  1. Describe issue impact on business: Data import needs less time connected to the SRS database. During investigation, a bug was found by a library used in SRS, and this change updates the library to a newer version where the bug has been fixed.

  2. What institutions are affected? (field “Affected Institutions” in Jira to be populated): All who use SRS or Data import

  3. What is the workaround if exists? None, jobs just continue to be slow

  4. What areas will be impacted by fix (i.e. what areas need to be retested): Confirm Data import Smoke and Critical path work as expected

  5. Brief explanation of technical implementation and the level of effort (in workdays) and technical risk (low/medium/high):
    Purpose
    There is an issue with lingering transactions when exceptions occur within lambda on some code paths. This is caused by a bug captured by the offending library used in SRS, jklingsporn/vertx-jooq#197.
    A symptom of this issue is that database connections are in a "idle in transaction" state for some time after an exception has occurred. Transactions are open but no SQL statements are sent through. The database is left waiting for any interaction before the connection is closed. Top suspect is that the connection is closed when the query executor, which holds a reference to the transaction object, is garbage collected. This also means that for the duration of the connection being held hostage other processes within SRS can't use the connection. In a Data Import job with lots of errors, the total duration of the job could increase considerably.
    Approach
    Upgrade vertx-jooq-classic-reactive to at least version 6.5.5
    Technical risk: Low

  6. Brief explanation of testing required and level of effort (in workdays). Provide test plan agreed with by QA Manager and PO: After the MODSOURCE and MODSOURMAN patches are applied, we need to retest the Smoke and Critical Path Data Import tests (most of which are automated), and perhaps selected Extended Manual tests. Manual testing across these MODSOURCE and MODSOURMAN changes are likely 3-5 days of work for manual QA, plus some input from PO.

  7. What is the roll back plan in case the fix does not work? Revert to previous version

CSP Request Details

Nolana/Orchid CSP requested 21 June 2023 Approved 22 June 2023 by Khalilah, Mike G, Kristin M, Mark V, Debra H, Harry K

CSP Rejection Details

None

Potential Workaround

None

Attachments

1

Checklist

hide

TestRail: Results

Activity

Show:

Kateryna Senchenko July 18, 2023 at 1:28 PM

No Issues with this fix were observed on Orchid bugfest. Closing this ticket.

, moving it back to CSP #3.

CC:  

JenkinsNotifications June 30, 2023 at 12:58 PM

Deployed to the Orchid bf env. Moved status to In bugfix review from status Awaiting deployment. Please proceed with the verification.

Done

Details

Assignee

Reporter

Priority

Story Points

Development Team

Folijet

Fix versions

Release

Orchid (R1 2023) Service Patch #3

RCA Group

Third party component integration

CSP Approved

Yes

Affected releases

Orchid (R1 2023)
Nolana (R3 2022)
Morning Glory (R2 2022)
Lotus (R1 2022)
Kiwi (R3 2021)

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created June 27, 2023 at 2:36 AM
Updated August 11, 2023 at 10:23 PM
Resolved June 27, 2023 at 2:38 AM
TestRail: Cases
TestRail: Runs