Import of 5 concurrent jobs become stuck
RCA Group
Description
Environment
Potential Workaround
Attachments
defines
relates to
Checklist
hideTestRail: Results
Activity

Ann-Marie BreauxMay 26, 2021 at 11:54 AM
Will focus on MODDATAIMP-419 instead

Nick CappadonaMay 13, 2021 at 2:55 PM
, Just in case my previous comment was missed, please consider closing this issue and instead focusing efforts on MODDATIMP-419 which is about handling a single import run of 10k+ records that includes a match profile and updates the MARC Bib.
I added results yesterday based on failed tests against the Iris reference environment.

Ann-Marie BreauxMay 13, 2021 at 2:34 PM
Grooming: needs review and checking on Iris; then decide if any updates are needed

Ann-Marie BreauxMay 11, 2021 at 5:25 PM
In this week's grooming, let's review this, plus the various bugfixes/changes related to PTF work, and link up the Jiras that we think would make concurrent imports testable in Iris bugfest.

Nick CappadonaApril 29, 2021 at 5:51 PM
Are you asking me to continue the steps as outlined in MODDATIMP-419 on Bugfest? It appears to be down (or "kaput") right now.
I just want to reiterate that these two tickets are related. We prefer to focus everyone's efforts on optimizing Data Import's ability to handle job executions of increasing size (10 - 15k records in this example) MODDATIMP-419 instead of having to split the file into multiples of <3k records and submit parrallel job executions (MODDATAIMP-424).

Ann-Marie BreauxApril 29, 2021 at 4:39 AMEdited
Hi The way that PTF tests are shaping up, it looks like Iris will be able to handle 50K without seizing up. Let's leave open for now; if you could try testing in Bugfest in the next couple of weeks (preferably toward the end of a US workday), then we'll figure out if anything else needs doing. We've definitely had success with large files in Bugfest, but not consistently and not with significant additional work happening in other parts of the system.
I'm going to block this for now, and we'll wait and see if any other work is needed. Does that sound OK with you?

Nick CappadonaApril 29, 2021 at 12:52 AM
Thanks for all your help,

Carole GodfreyApril 27, 2021 at 10:45 PM
Thanks for attaching the files – I agree with making sure that a single import works successfully for Iris and agree this can be closed

Nick CappadonaApril 27, 2021 at 3:25 PM
I've added the files as requested.
Initial Data Import Run: (SRS MARC Bib and Instance creation using default DI profile):
Follow up Data Import Run: (Overlay using our own set of DI profiles):
I also provided the full 10,300 processed records in a single file:
It will be tough to fully recreate the conditions for this test without our DI profiles. I can share them here, but honestly my vote would be to not dwell on the past (Honeysuckle) and instead close this issue so we can focus on standing up a stable Iris environment that can be used for testing. and – do you agree?
We don't actually want to split the processed MARC into multiple files and parallel DI job executions. We prefer to submit a single file, but Honeysuckle can't handle this. Please see our latest results running our preferred process against Iris Bugfest (incomplete) and snapshot-load (success) in MODDATAIMP-419.
Details
Assignee
Ruslan LavrovRuslan LavrovReporter
Carole GodfreyCarole GodfreyPriority
P2Story Points
3Development Team
FolijetRelease
R2 2021Affected Institution
CornellTestRail: Cases
Open TestRail: CasesTestRail: Runs
Open TestRail: Runs
Details
Details
Assignee

Reporter

This issue is observed in a Honeysuckle HF3 FOLIO environment.
Creating ticket with context to check the same scenario does not exist in Iris environment
Specifically - 5 parallel Data Import job runs were submitted at that time, jobs progressed for about 30 mins or so and have been stuck ever since.
Observe errors in mod-pubsub audit_message table indicating some events were not successfully delivered (most likely due to issues with inventory at the time)Error delivering DI_SRS_MARC_BIB_INSTANCE_HRID_SET event with id 'ad1a0390-7926-4533-b9a7-044f80d170c9' to /inventory/handlers/instances, response status code is 503, Service Unavailable
As well as messages in the log
Error delivering DI_SRS_MARC_BIB_INSTANCE_HRID_SET event with id '22d302a2-36c0-4ac0-9618-037ee6a33806' to /inventory/handlers/instances, response status code is 502, Bad Gateway
And message from mod-source-record-storage
20:15:00.081 [vert.x-eventloop-thread-0] ERROR tpClientResponseImpl [10600064eqId] io.vertx.core.VertxException: Connection was closed
Spikes in CPU and Memory were observed for both mod-inventory and mod-pubsub
After Restarting modules – attempts to run concurrent imports continued to get stuck