[FOLIO-2964] "NPM Install" sometimes dies in Jenkins with network connection problems Created: 15/Jan/21  Updated: 22/Sep/22  Resolved: 22/Sep/22

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Zak Burke Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Sprint:
Development Team: FOLIO DevOps

 Description   

UI builds occasionally fail in Jenkins during the "NPM Install" step with this error:

There appears to be trouble with your network connection. Retrying...

Specifically, this is a problem for the builds on master that are triggered automatically whenever a PR merges because the failure is silent. From a developer's point of view, the PR merged successfully (indeed, it did) but there is no indication that the subsequent master build, which actually publishes those changes to our NPM repository (thus allowing them to be picked up by the automated reference builds), has failed.

The problem can almost always be overcome by restarting the build, usually just once, but sometimes it takes a few tries. It is neither frequent nor rare, and while it's easy to deal with if you know how, it's a bit of a mystery if you don't. Multiplied across the dozens of UI repositories we maintain, it starts to feel like death by a thousand papercuts.

I started seeing this error around November 2020. It seems to get worse during release weeks when many people are hammering on Jenkins, but it continues to show up even when the Jenkins job queue is empty.



 Comments   
Comment by Zak Burke [ 21/Jan/21 ]

It continues.

Comment by Zak Burke [ 21/Jan/21 ]

Still going

Comment by Zak Burke [ 22/Jan/21 ]

And again

Comment by Zak Burke [ 22/Jan/21 ]

NB: updated the description to indicate why this failure is such a problem:

Specifically, this is a problem for the builds on master that are triggered automatically whenever a PR merges because the failure is silent. From a developer's point of view, the PR merged successfully (indeed, it did) but there is no indication that the subsequent master build, which actually publishes those changes to our NPM repository (thus allowing them to be picked up by the automated reference builds), has failed.

Comment by Zak Burke [ 22/Jan/21 ]

(Also, it's not just stripes-components/not just master builds)

Comment by Zak Burke [ 29/Jan/21 ]

ui-erm-usage and ui-finance.

Comment by Zak Burke [ 08/Feb/21 ]

ui-calendar

Comment by Zak Burke [ 19/Feb/21 ]

ui-requests

Comment by Zak Burke [ 01/Mar/21 ]

ui-developer

Comment by Mike Taylor [ 01/Mar/21 ]

Another manifestation of the same problem prevents new code from being merged at all: for example, I just made a trivial pull-request that fixes two lint errors in ui-developer. That PR failed the tests even though ui-developer doesn't even have any damn tests, because the NPM installation required to get ui-developer to the point where we can run yarn test (and see the message placeholder. no tests implemented) failed. So now I am babysitting Jenkins' repeated attempts to run this process to completion.

Need I saw what a monumental waste of time, energy and morale this is? Not just for me, but for every other developer who has to hit Jenkins repeatedly with a big hammer every time they want to fix a trivial lint error or correct a typo?

I don't know what causes this, and I have no idea how to fix it, but I really really really want us to take it seriously, It's a huge drag on productivity, and it affects every front-end developer.

Comment by Mike Taylor [ 01/Mar/21 ]

I've gone ahead and marked this P2. (Not P1 because it doesn't quite actually prevent us from getting UI work done; it just makes that work take longer than it ought to, and imposes a cognitive/temporal tax on being a good citizen.)

Comment by Zak Burke [ 01/Mar/21 ]

FYI from #devops: (a) this can't be reliably reproduced, which makes it an absolute bear to troubleshoot, though we have tried a few things. (b) other options we can try cost actual cash money, which maybe is OK, but we can't just blindly pursue these options; we are bringing this to the attn of the people with the purse strings.

Comment by Zak Burke [ 22/Sep/22 ]

UI builds have moved to GitHub actions, with a few exceptions, and consequently these failures on Jenkins are no longer frequent enough to be a concern.

Generated at Thu Feb 08 23:24:34 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.