Batch Importer (Bib/Acq) (UXPROD-47)

[FOLIO-2656] After a while some of requests fail with network error exception Created: 25/Jun/20  Updated: 10/Jul/20  Resolved: 07/Jul/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None
Parent: Batch Importer (Bib/Acq)

Type: Bug Priority: P1
Reporter: Ivan Kryzhanovskyi Assignee: Oleksii Kuzminov
Resolution: Won't Do Votes: 0
Labels: data-import, epam-folijet
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: Text File mod-data-import.log    
Issue links:
Defines
defines UXPROD-2220 NFR: Data Import (Batch Importer for ... Closed
Relates
relates to UIIN-1176 Bugfest: View source button is greyed... Closed
Sprint: Folijet Sprint 92
Story Points: 0
Development Team: Folijet
Release: Q2 2020
Epic Link: Batch Importer (Bib/Acq)

 Description   

Overview:
Steps to Reproduce:

  1. Log into https://folio-snapshot.aws.indexdata.com/data-import environment and follow to the data-import page
  2. Try to import mrc file.

Expected Results:
From FE perspective: such requests should perform correctly.

Actual Results:
From FE perspective: such requests stay a while in pending state and after that fail with network error
URL: just for example https://folio-snapshot.aws.indexdata.com/data-import, after trying to import file, request https://folio-snapshot-okapi.aws.indexdata.com/data-import/uploadDefinitions doesn't perform correctly. You can see it in the developer inspector tools, Network tab. And this is not a single request that failed with network error.

From BE perspective: corresponding modules don't get those request



 Comments   
Comment by Ann-Marie Breaux (Inactive) [ 25/Jun/20 ]

Conversation between Eric Valuk and Ivan yesterday:

Hi @Ivan Kryzhanovskyi: @Eric Valuk from EBSCO DevOps is looking at the BugFest problem that you described on the #development Slack channel. Eric is asking for more background on what you were trying to do - could you provide any additional info? Thank you!
======================================
On bugfest env some requests blocking by CORS policy.
For example, when we try to import file, after a while got an error in console.
It's look like related to server configuration.
Could someone assist with described issue?

Eric Valuk 10:40 AM
It would be very helpful to know when the error happened, or possibly which api is causing issues

Ivan Kryzhanovskyi 10:49 AM
Hi, @ann-marie, @Eric Valuk very first example - I trying to import file at the https://bugfest-goldenrod.folio.ebsco.com/data-import . During this action we have POST request (Request URL: https://okapi-bugfest-goldenrod.folio.ebsco.com/data-import/uploadDefinitions), which stay in pending status for a while, and then fail with net::ERR_FAILED, and in the console we can an error. On other envs this request perform correctly. Are there any steps which we can perform from FE side to fix this issue?

Eric Valuk 11:04 AM
I see a request that took 400 seconds made 15 minutes ago
11:04
I assume this was you?
11:04
or someone using the api I guess it could be also
11:09
I see this error
11:09
14:38:37.506 [vert.x-eventloop-thread-0] ERROR efinitionServiceImpl [176758666eqId] Error creating new JobExecution for UploadDefinition with id c27b263c-848d-467e-a8ef-51b87a879e0a

Ivan Kryzhanovskyi 11:10 AM
Not sure that was me. I've tried described flow few hours before. And I must say, this is not the single request with issue. I found at least one more, when we try open action menu on https://bugfest-goldenrod.folio.ebsco.com/inventory/view/728c2378-7e5f-4af5-aebf-2caad8f28dbf?qindex=querySearch&query=source%3DMARC&sort=Title . We have a GET request (Request URL: https://okapi-bugfest-goldenrod.folio.ebsco.com/source-storage/formattedRecords/728c2378-7e5f-4af5-aebf-2caad8f28dbf?identifier=INSTANCE) that has the same issue

Eric Valuk 11:49 AM
log-insights-data-import-timeout.csv
@timestamp,@logStream,@message
2020-06-24 15:41:50.432,gbf/mod-data-import/nginx/6a301689-7625-497f-8dd7-d2bf05ecde6f,"10.23.36.42 - - [24/Jun/2020:15:41:49 +0000] ""POST /mod-data-import/data-import/uploadDefinitions HTTP/1.1"" 499 0 rt=400.000 uct="""" uht="""" urt=""-"" ""Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36"" ""fs09000000"" ""eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJhYnJlYXV4IiwidXNlcl9pZCI6IjNhYzFjNjM1LTE1NTgtNGQ1Ny1iMTg4LWJkMGQxYzYzMGQzZCIsIm1vZHVsZSI6Im1vZC1kYXRhLWltcG9ydC0xLjEwLjAiLCJleHRyYV9wZXJtaXNzaW9ucyI6WyJTWVMjbW9kLWRhdGEtaW1wb3J0LTEuMTAuMCNcL2RhdGEtaW1wb3J0XC91cGxvYWREZWZpbml0aW9ucyNbUE9TVF0iXSwicmVxdWVzdF9pZCI6IjY4NjE5OVwvZGF0YS1pbXBvcnQiLCJ0ZW5hbnQiOiJmczA5MDAwMDAwIn0.7ofOff-GFRg4WeaiyEYgZ9OZ1ZGEjRgM3gjzxT4RYEI"" ""{\x22fileDefinitions\x22:[

{\x22uiKey\x22:\x22examples21.mrc1592287895160\x22,\x22size\x22:22,\x22name\x22:\x22examples21.mrc\x22}

]}"""
2020-06-24 15:41:49.558,gbf/mod-data-import/6a301689-7625-497f-8dd7-d2bf05e...
Click to expand inline (134 lines)

Ann-Marie Breaux - Other input from the dev channel:
Wayne Schneider [10:22 AM]
My guess would be that the backend module is unresponsive at the moment (but I have no insight into the bugfest configuration)
Marc Johnson [11:07 AM]
It appears that some requests made in the BugFest environment might be terminated with a 502 Bad Gateway error.
This might be caused by a maximum response time or response size, or it might be something that would manifest in any environment with sufficient data.
See https://folio-org.atlassian.net/browse/UIU-1719 for information on my investigation so far. I guess this could also manifest as a CORS error.

Eric Valuk 12:16 PM
So my investigation so far is that the request did timeout, I included the log file above that came from the modules around the time of the request that was made on the call we just had a bit ago

12:16
I dont have much information about the error beyond the log output

Comment by Craig McNally [ 25/Jun/20 ]

Just my two cents here... Marc Johnson mentions 502 here, but if you look at the log entry provided in the comment above its actually a 499 that's returned, indicating that the client closed the connection... The question becomes, who's the client in the situation? Based on the user agent string it's the browser/stripes. Another potentially important piece of information here is the rt=400.000... indicating that the response was returned after exactly 400 seconds... Is there a 400s timeout in Stripes/Stripes-connect/elsewhere in the UI?

Comment by Craig McNally [ 25/Jun/20 ]

Also, is there a reason why the backend would really take that long to respond? I'm guessing that the real issue here is with the backend

Comment by Ann-Marie Breaux (Inactive) [ 25/Jun/20 ]

All good questions Craig McNally. Ivan researched it deeply enough to determine that it's not an issue with any of Folijet's modules, at least as far as we can tell. We're pinning lots of hope on being able to unblock import/export testing on bugfest tomorrow, but right now, there's no clear indication we can even upload a file properly.

Comment by Marc Johnson [ 26/Jun/20 ]

Craig McNally

Just my two cents here... Marc Johnson mentions 502 here

Apologies, I may have caused some confusion. The 502 response was from a specific endpoint (for unknown reasons at this point). I don't know if that specific case is related to this issue.

Comment by Kateryna Senchenko [ 26/Jun/20 ]

Hi everyone,
This issue is reproducing on all reference envs at some point during the day making whole data-import functionality unavailable (files cannot be uploaded).
Ann-Marie Breaux, would it be OK to raise the priority for this ticket to P1?

CC: Taras Spashchenko, Oleksii Kuzminov, Ivan Kryzhanovskyi

Comment by Oleksii Kuzminov [ 26/Jun/20 ]

Marc JohnsonCraig McNallyHello. On the backend, there are no requests handled. See attached log file
In okapi logs I founded several requests that proxied to mod-auth and that's it
Backend and UI codebase for this functionality doesn’t change from Q4

Comment by Craig McNally [ 26/Jun/20 ]

Oleksii Kuzminov log snippets showing this would be helpful.

Never mind, I misread your comment; didn't realize you DID provide logs

Comment by Kruthi Vuppala [ 26/Jun/20 ]

FWIW we see this same behavior for data-export when we try to upload a file on both folio-testing and folio-snapshot. We started seeing this behavior today, and verified that it works on the vagrant box and local deployments.

Comment by Ann-Marie Breaux (Inactive) [ 26/Jun/20 ]

Argh. Definitely a P1. If it works in some environments, but not others, that sounds like it might be an environment problem for folio-testing and -snapshot. Who is in the best position to try to resolve this? DevOps? It's sounding like not really the Import or Export team to me. And we're at the end of the workweek for Europe now. Other ideas?

Jakub Skoczen Craig McNally

Comment by Craig McNally [ 29/Jun/20 ]

Kruthi Vuppala were you using the UI for your local testing, or hitting the APIs directly?

Comment by Oleksii Kuzminov [ 29/Jun/20 ]

Hi Craig McNally
I found this logs in mod-data-import on bugfest env. Maybe it can be helpful

2020-06-29T15:02:56.250Z 15:02:56.250 [vert.x-eventloop-thread-0] ERROR efinitionServiceImpl [610217410eqId] Error creating new JobExecution for UploadDefinition with id 58db8f38-f31d-4d64-9921-937ad89116bc
2020-06-29T15:02:56.251Z 15:02:56.251 [vert.x-eventloop-thread-0] ERROR ExceptionHelper [610217411eqId] Gateway Timeout
2020-06-29T15:02:56.251Z io.vertx.ext.web.handler.impl.HttpStatusException: Gateway Timeout

Requests from postman also fail with 504 Gateway Time-out

Comment by Hongwei Ji [ 29/Jun/20 ]

Oleksii Kuzminov FYI the 504 Gateway error seems from srs-manager. I restarted srs-storage and bugfest seems to be working again now.

Comment by Kruthi Vuppala [ 29/Jun/20 ]

Craig McNally Was hitting the backend APIs directly

Comment by Ann-Marie Breaux (Inactive) [ 29/Jun/20 ]

Just tested on Bugfest:

Upload and then import with secret button works. Upload and then importing with a job profile (PubSub) does not work. The file gets stuck.

And now View source is greyed out again, and quickMARC can't open the record for editing any more. Sigh

cc: Hongwei Ji Sobha Duvvuri Oleksii Kuzminov

Comment by Ann-Marie Breaux (Inactive) [ 07/Jul/20 ]

Hi Ivan Kryzhanovskyi This seems to have cleared up on Bugfest and folio-snapshot. Do you think we need to keep this ticket open?

Comment by Oleksii Kuzminov [ 07/Jul/20 ]

Done in scope of MODSOURCE-162 Closed

Generated at Thu Feb 08 23:22:17 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.