Import of 5 concurrent jobs become stuck

RCA Group

None

Description

This issue is observed in a Honeysuckle HF3 FOLIO environment.

Creating ticket with context to check the same scenario does not exist in Iris environment

Specifically - 5 parallel Data Import job runs were submitted at that time, jobs progressed for about 30 mins or so and have been stuck ever since.

Observe errors in mod-pubsub audit_message table indicating some events were not successfully delivered (most likely due to issues with inventory at the time)Error delivering DI_SRS_MARC_BIB_INSTANCE_HRID_SET event with id 'ad1a0390-7926-4533-b9a7-044f80d170c9' to /inventory/handlers/instances, response status code is 503, Service Unavailable

As well as messages in the log

Error delivering DI_SRS_MARC_BIB_INSTANCE_HRID_SET event with id '22d302a2-36c0-4ac0-9618-037ee6a33806' to /inventory/handlers/instances, response status code is 502, Bad Gateway

And message from mod-source-record-storage
20:15:00.081 [vert.x-eventloop-thread-0] ERROR tpClientResponseImpl [10600064eqId] io.vertx.core.VertxException: Connection was closed

Spikes in CPU and Memory were observed for both mod-inventory and mod-pubsub

After Restarting modules – attempts to run concurrent imports continued to get stuck

Environment

None

Potential Workaround

None

Attachments

7

Checklist

hide

TestRail: Results

Activity

Show:

Ann-Marie BreauxMay 26, 2021 at 11:54 AM

Will focus on MODDATAIMP-419 instead

Nick CappadonaMay 13, 2021 at 2:55 PM

, Just in case my previous comment was missed, please consider closing this issue and instead focusing efforts on MODDATIMP-419 which is about handling a single import run of 10k+ records that includes a match profile and updates the MARC Bib.

I added results yesterday based on failed tests against the Iris reference environment.

Ann-Marie BreauxMay 13, 2021 at 2:34 PM

Grooming: needs review and checking on Iris; then decide if any updates are needed

Ann-Marie BreauxMay 11, 2021 at 5:25 PM

In this week's grooming, let's review this, plus the various bugfixes/changes related to PTF work, and link up the Jiras that we think would make concurrent imports testable in Iris bugfest.

Nick CappadonaApril 29, 2021 at 5:51 PM

Are you asking me to continue the steps as outlined in MODDATIMP-419 on Bugfest? It appears to be down (or "kaput") right now.

I just want to reiterate that these two tickets are related. We prefer to focus everyone's efforts on optimizing Data Import's ability to handle job executions of increasing size (10 - 15k records in this example) MODDATIMP-419 instead of having to split the file into multiples of <3k records and submit parrallel job executions (MODDATAIMP-424).

Ann-Marie BreauxApril 29, 2021 at 4:39 AM
Edited

Hi The way that PTF tests are shaping up, it looks like Iris will be able to handle 50K without seizing up. Let's leave open for now; if you could try testing in Bugfest in the next couple of weeks (preferably toward the end of a US workday), then we'll figure out if anything else needs doing. We've definitely had success with large files in Bugfest, but not consistently and not with significant additional work happening in other parts of the system.

I'm going to block this for now, and we'll wait and see if any other work is needed. Does that sound OK with you?

Nick CappadonaApril 29, 2021 at 12:52 AM

Thanks for all your help,

Carole GodfreyApril 27, 2021 at 10:45 PM

Thanks for attaching the files – I agree with making sure that a single import works successfully for Iris and agree this can be closed

Nick CappadonaApril 27, 2021 at 3:25 PM

I've added the files as requested.

Initial Data Import Run: (SRS MARC Bib and Instance creation using default DI profile):

  •  

 

Follow up Data Import Run: (Overlay using our own set of DI profiles):

  •  

 

I also provided the full 10,300 processed records in a single file:

 

It will be tough to fully recreate the conditions for this test without our DI profiles. I can share them here, but honestly my vote would be to not dwell on the past (Honeysuckle) and instead close this issue so we can focus on standing up a stable Iris environment that can be used for testing. and – do you agree?

 

We don't actually want to split the processed MARC into multiple files and parallel DI job executions. We prefer to submit a single file, but Honeysuckle can't handle this. Please see our latest results running our preferred process against Iris Bugfest (incomplete) and snapshot-load (success) in MODDATAIMP-419.

Won't Do

Details

Assignee

Reporter

Priority

Story Points

Development Team

Folijet

Release

R2 2021

Affected Institution

Cornell

TestRail: Cases

Open TestRail: Cases

TestRail: Runs

Open TestRail: Runs

Created April 23, 2021 at 7:10 PM
Updated October 29, 2021 at 7:24 AM
Resolved May 26, 2021 at 11:54 AM
TestRail: Cases
TestRail: Runs