[FOLIO-2360] SPIKE: unit test results are inconsistent Created: 19/Nov/19  Updated: 03/Jun/20  Resolved: 27/Jan/20

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Story Priority: TBD
Reporter: Zak Burke Assignee: Zak Burke
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Duplicate
duplicates FOLIO-2097 Unexpected "timeout exceeded" fails i... Closed
Sprint: Core: F - Sprint 77, Core: F - Sprint 78, Core: F - Sprint 80, Core: F - Sprint 79, Core: F - Sprint 81
Development Team: Prokopovych

 Description   

There are many facets to this:

  • sometimes we see failures in CI that we cannot repeat locally
  • sometimes local results are inconsistent and simply re-running the suite without changes shows different results
  • sometimes a test will fail when run as part of the full suite but it succeeds when run with a .only attached to its describe block


 Comments   
Comment by John Malconian [ 19/Nov/19 ]

I've run unit tests for stripes-smart-components branch stsmacom-251-viewcreatelink on the same Docker build image used by FOLIO CI. This image has recently been updated to include Node 10 and Chrome 78. All tests pass when this build image is run locally on my desktop. There are tests failures when I run this same image on an isolated AWS t3.2xlarge ec2 instance. The build consistently produces between 2-6 errors in this environment. The t3.2xlarge instance contains 32GB of RAM and 8 cores. There is no evidence of any kind of resource contention.

Comment by Peter Murray [ 19/Nov/19 ]

I find it is also true that I can restart CI tests in Jenkins and they will eventually pass. See, for instance: https://github.com/folio-org/stripes-smart-components/pull/663

Comment by Viktor Soroka [ 19/Nov/19 ]

Tests in stripes-smart-components passed not because of a just re-run, the tests code were changed a bit in order to pass. So the issue is still the case. For example, tests in ui-users are still failing.

Comment by Peter Murray [ 19/Nov/19 ]

I didn't make changes to the code in the tests between one CI run and the next. (I'm just making changes to the language files other than `en.json`, which I wouldn't expect would have a variable impact on the CI tests.) In any case, if my insight isn't helpful in this issue, please ignore it.

Comment by Zak Burke [ 19/Nov/19 ]

Viktor Soroka fixed the shoddy tests I wrote in stripes-smart-components. The problem there was a bit of a race condition; there was the potential to use an object before it was properly initialized. For whatever reason, this problem never manifested for any of us when running the tests locally, not even when John Malconian ran them in a local Docker container built from the Jenkins container.

It's quite possible the tests in ui-users are failing for a similar reason but that we just haven't found the source yet.

If I'm being totally honest, Mocha gives you way, way more than enough rope to hang yourself with their hooks implementation, which is the source of all this trouble. Code inside before and beforeEach blocks has the potential to execute before code defined higher up. On the one hand, this makes some sense, and it is written right there on the front page of the manual. On the other, this contravenes every expectation I have about how code executes. It doesn't matter it says "gas on left; brake on the right" right there on the front of your manual; it's a bad design choice.

Comment by Viktor Soroka [ 19/Nov/19 ]

I didn't make changes to the code in the tests between one CI run and the next. (I'm just making changes to the language files other than `en.json`, which I wouldn't expect would have a variable impact on the CI tests.) In any case, if my insight isn't helpful in this issue, please ignore it.

Peter Murray, I just added more context so folks really understand why the PR for translations has really passed rather than just ignore your point.

Comment by Viktor Soroka [ 20/Nov/19 ]

> I've run unit tests for stripes-smart-components branch stsmacom-251-viewcreatelink on the same Docker build image used by FOLIO CI.

John Malconian, could you please try the same for ui-users module on master branch?

Comment by Maxim Didenko [ 20/Nov/19 ]

tests for ui-circluation are also falling.

Comment by John Malconian [ 20/Nov/19 ]

Viktor Soroka I ran two builds of ui-users on the Jenkins build image locally with two different sets of errors during the unit tests.

1st build:

  Permissions modal
    add permission button
      ✔ should be displayed
      ✔ should be 0 permissions
      add permission button click
        permissions modal
          ✔ should be displayed
          ✔ should have proper label
20 11 2019 15:15:30.993:WARN [Chrome 78.0.3904 (Linux 0.0.0)]: Disconnected (0 times)reconnect failed before timeout of 2000ms (transport close)
Chrome 78.0.3904 (Linux 0.0.0) ERROR
  Disconnectedreconnect failed before timeout of 2000ms (transport close)
Chrome 78.0.3904 (Linux 0.0.0) ERROR
  Disconnectedreconnect failed before timeout of 2000ms (transport close)

Finished in 5 mins 16.946 secs / 18.36 secs @ 15:15:31 GMT+0000 (Coordinated Universal Time)


=============================== Coverage summary ===============================
Statements   : Unknown% ( 0/0 )
Branches     : Unknown% ( 0/0 )
Functions    : Unknown% ( 0/0 )
Lines        : Unknown% ( 0/0 )
================================================================================
KarmaCommand failed with exit code 1.

2nd build:

SUMMARY:
✔ 287 tests completed
✖ 6 tests failed

FAILED TESTS:
  Permission set form
    add permission button
      add permission button click
        ✖ "before each" hook for "should be presented"
          Chrome 78.0.3904 (Linux 0.0.0)
        Error: convergent assertion was successful, but exceeded the 2000ms timeout
            at loop (test/bigtest/index.js:692:17)

  Settings refunds
    ✖ "before each" hook for "renders proper amount of columns"
      Chrome 78.0.3904 (Linux 0.0.0)
    Error: convergent assertion was successful, but exceeded the 3000ms timeout
        at loop (test/bigtest/index.js:692:17)

  User Edit Page
    ✖ "before each" hook for "should display validation error"
      Chrome 78.0.3904 (Linux 0.0.0)
    Error: convergent assertion was successful, but exceeded the 5000ms timeout
        at loop (test/bigtest/index.js:692:17)

  User Edit: Proxy/Sponsor
    visiting the edit user page
      ✖ "before each" hook for "relationship status should be inactive"
        Chrome 78.0.3904 (Linux 0.0.0)
      Error: convergence exceeded the 2000ms timeout
          at getElapsedSince (test/bigtest/index.js:742:11)
          at collectExecStats (test/bigtest/index.js:788:16)
          at CustomInteractor.runCallback (test/bigtest/index.js:808:12)
          at test/bigtest/index.js:919:32

  User view
    pane header dropdown menu
      ✖ "before each" hook for "should redirect to instance edit page"
        Chrome 78.0.3904 (Linux 0.0.0)
      Error: convergent assertion was successful, but exceeded the 2000ms timeout
          at loop (test/bigtest/index.js:692:17)

  Status filter
    show all users
      ✖ should be proper amount of users
        Chrome 78.0.3904 (Linux 0.0.0)
      Error: convergent assertion was successful, but exceeded the 2000ms timeout
          at loop (test/bigtest/index.js:7927:17)

TOTAL: 6 FAILED, 287 SUCCESS

=============================== Coverage summary ===============================
Statements   : 75.21% ( 2833/3767 )
Branches     : 66.94% ( 1470/2196 )
Functions    : 69.52% ( 942/1355 )
Lines        : 76.04% ( 2662/3501 )
================================================================================
20 11 2019 15:30:11.002:WARN [launcher]: Chrome was not killed in 2000 ms, sending SIGKILL.
Karma exited with 1
error Command failed with exit code 1.

Comment by Zak Burke [ 21/Nov/19 ]

Viktor Soroka did an interesting experiment in ui-circulation where he rolled back the version of @folio/stripes and, although that generated some errors related to the changed stripes APIs, there were not any "Error: convergent assertion was successful, but exceeded the 2000ms timeout" errors.

Related, I wonder if it is possible to convert those errors to warnings.

Comment by Zak Burke [ 21/Nov/19 ]

Michal Kuklis experimented with monkey-patching BigTest to increase the default timeout from 2 seconds to 10 seconds because so many of these errors are like,

Error: convergent assertion was successful, but exceeded the 2000ms timeout

but all that did was change the message to

Error: convergent assertion was successful, but exceeded the 10000ms timeout

This makes me highly skeptical of whether that is a legit error.

Comment by Zak Burke [ 21/Nov/19 ]

The error Viktor Soroka fixed was

Error: Uncaught TypeError: Cannot read property ‘manifest’ of null (tests/index.js:355930)

which may happen if you call mount on the same component multiple times. The original test executed

ConnectedComponent = connectStripes(SearchAndSort);

only once and then repeatedly called mount(<ConnectedComponent ...>) in each describe block. Moving the connectStripes... call into each describe block containing a mount call resolved the problem.

Comment by John Malconian [ 21/Nov/19 ]

I've re-enabled publishing of yarn.lock for all NPM-based builds. Example: https://jenkins-aws.indexdata.com/job/folio-org/job/ui-inventory/job/UIIN-842/3/

Check navigation on the left to download the file.

Comment by Zak Burke [ 22/Nov/19 ]

Revelation: many of these tests failures are not actually test failures; they are initializations failures. Look carefully:

"before each" hook for "renders proper values after update"
Chrome 78.0.3904 (Linux 0.0.0) Manual charges create new manual feature "before each" hook for "renders proper values after update" FAILED

That’s a timeout in the beforeEach block the “renders proper values…” test is attached to. Over the course of many PRs in ui-users all focused on careful initialization of values before tests, and additional "wait" calls in test-init blocks (ui-users PR #1057, ui-users PR #1058, and ui-users PR #1059) I managed to get things to a stable state.

Current theory: this is basically a sync problem. Either we were interacting with elements that were not properly initialized (because block scoping means nothing to Mocha when it comes to before blocks, which are hauled out of the apparent flow of things) or we were not waiting properly within those before blocks for all the elements we want to interact with, e.g. we needed to wait for a <select> and an <MCL>.

I'll take a look at inventory now and see if similar changes there also result in consistent test runs. :crossed_fingers:

Comment by Zak Burke [ 10/Dec/19 ]

See also the related conversation on FOLIO-2097 where there is a discussion about various improvements and regressions between @bigtest/interactor v0.7 and v0.9.

Comment by Zak Burke [ 27/Jan/20 ]

I'm closing this as a duplicate of FOLIO-2097 Closed . We have identified memory leaks in a variety of libraries, some third-party and some our own, that seem to be the likeliest cause of failing unit tests, and there is more discussion of those leaks on that ticket.

Generated at Thu Feb 08 23:20:05 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.