[FOLIO-3040] Failure adding springway modules reference environments: Timed out after waiting 300000 Created: 26/Feb/21  Updated: 03/Mar/21  Resolved: 03/Mar/21

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P2
Reporter: David Crossley Assignee: David Crossley
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: File fst-334-okapi.log.gz     Text File ftt-116-okapi.log     File vm-courses-okapi.log.gz    
Issue links:
Relates
relates to FOLIO-3030 Install mod-data-export-spring in the... Closed
relates to FOLIO-3031 Install mod-data-export-worker in the... Closed
relates to FOLIO-3025 Add mod-ebsconet to testing/snapshot ... Closed
Sprint: DevOps Sprint 109
Development Team: FOLIO DevOps

 Description   

Trying to add three new "Spring Way" modules to the reference environments. They all failed to deploy, in the same way.

I cannot tell if it is an issue with the module, or something wider.

This report is with adding mod-ebsconet ( FOLIO-3025 Closed ). They indicate that they have successfully installed with a local Vagrant VM.

See Jenkins build folio-testing-test/116

The final Jenkins snippet:

fatal: [10.36.1.201]: FAILED! => {"changed": false, "connection": "close", "content": "Timed out after waiting 300000(ms) for a reply. address: __vertx.reply.42, repliedAddress: http://10.36.1.201:9130/deploy", "content_length": "121", "content_type": "text/plain", "elapsed": 300, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "redirected": false, "status": 400, "url": "http://10.36.1.201:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadSample%3Dtrue%2CloadReference%3Dtrue", "vary": "origin"}

Okapi version is 4.6.5

 Complete Okapi log is attached ftt-116-okapi.log



 Comments   
Comment by Aleksei Prokhorov [ 26/Feb/21 ]

We have made changes in mod-data-export-spring and mod-data-export-worker. Please retry

Comment by David Crossley [ 27/Feb/21 ]

After the code improvements to mod-data-export-spring and mod-data-export-worker, i did try again today with folio-testing-test Jenkins build. The mod-data-export-spring does now start ( FOLIO-3030 Closed ). However the mod-data-export-worker does not start ( FOLIO-3031 Closed ).

The situation with mod-ebsconet (the original topic of this ticket) is the same. It still does not start, and has the same "timeout" issue as described here. See FOLIO-3025 Closed , perhaps they need fixes similar to mod-data-export-spring.

Comment by David Crossley [ 27/Feb/21 ]

I also tried adding mod-ebsconet to folio-snapshot-test/334. The same timeout problem. See Okapi log fst-334-okapi.log.gz

Comment by David Crossley [ 01/Mar/21 ]

I have found a clue, but it will take me a while to explain it. Back soon.

Comment by David Crossley [ 01/Mar/21 ]

Today i tried to deploy mod-ebsconet on the VM folio/snapshot-core
However got the same "timeout" troubles as described in this ticket.

Comment by David Crossley [ 01/Mar/21 ]

Then i tried to deploy a module that we know will deploy (mod-courses). However there was similar trouble, but it did eventually deploy.

I will try to explain that. The times are UTC for the attached logfile vm-courses-okapi.log.gz
folio/snapshot-core version '1.0.0-20210228.5924'
okapi-4.7.0

05:44 commence install enable and deploy
network quiet after getting the image
05:50 the deploy command returned with a timeout
Timed out after waiting 300000(ms) for a reply. address: __vertx.reply.31, repliedAddress: http://10.0.2.15:9130/deploy
now there is continued network activity ~40 packets/sec
06:01 docker container is up
network is quiet
curl -s http://localhost:9130/_/proxy/tenants/diku/modules | jq '.[].id' | grep courses
nothing
06:03 commence deploy again
immediately mod-courses is deployed
curl -s http://localhost:9130/_/proxy/tenants/diku/modules | jq '.[].id' | grep courses
curl -w '\n' http://localhost:9130/_/discovery/health
all okay for mod-courses

Comment by David Crossley [ 01/Mar/21 ]

But trying similar for mod-ebsconet was not successful.
After Okapi returns that "timeout" message, then some time later the docker container is up.
Trying the deploy again is not successful, like it was for mod-courses.
Okapi gives up completely after 60 retries and deletes the docker container.

Comment by David Crossley [ 02/Mar/21 ]

Oh really! I verified with today's "snapshot-core" VM ("v1.0.0-20210301.5930" okapi-4.7.0) and with various different older ones. Including that one "1.0.0-20210228.5924" from my comments yesterday – i cannot repeat that behaviour with deploying mod-courses. All is well with deployment, and its endpoints are reachable. Sorry for that noise.

Comment by David Crossley [ 02/Mar/21 ]

However, the mod-ebsconet ( FOLIO-3025 Closed ) will not deploy with any recent VM with okapi-4.7.0. There are the "timeout" issues as already described.

Note that the folio/testing VM is still out-of-date (with okapi-4.6.3). With that, mod-ebsconet does appear to deploy.

It does not yet have any functional endpoint to verify with. However the Okapi "/_/discovery/health" endpoint reports "Fail: finishConnect(..) failed: Connection refused: /10.0.2.15:9179".

Comment by Jakub Skoczen [ 03/Mar/21 ]

David Crossley Adam is going to release Okapi 4.7.1 with a potential fix – please give it a go.

Comment by David Crossley [ 03/Mar/21 ]

This was Okapi not being able to connect to the module port for mod-ebsconet and eventually giving up. A code configuration error meant that the module was listening on different port. See FOLIO-3025 Closed today.

Generated at Thu Feb 08 23:25:09 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.