[FOLIO-1804] Reference environment builds fail due to apparent timeout Created: 13/Feb/19 Updated: 17/Mar/21 Resolved: 15/Feb/19 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Bug | Priority: | P1 |
| Reporter: | Marc Johnson | Assignee: | John Malconian |
| Resolution: | Done | Votes: | 0 |
| Labels: | ci, environments, sprint57 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Issue links: |
|
||||||||||||||||||||||||
| Sprint: | |||||||||||||||||||||||||
| Development Team: | Core: Platform | ||||||||||||||||||||||||
| Description |
|
Apologies for the rubbish title, please amend as we know more information. Symptoms Snapshot RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Build tenant parameter list] ***
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadSample', u'value': u'true'}], 'key': u'mod-orders-storage'})
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadReference', u'value': u'true'}], 'key': u'mod-inventory-storage'})
RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Discard duplicates in tenant parameter list] ***
ok: [10.36.1.64]
RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Convert tenant parameter list to query string] ***
ok: [10.36.1.64] => (item={u'name': u'loadSample', u'value': u'true'})
ok: [10.36.1.64] => (item={u'name': u'loadReference', u'value': u'true'})
RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Perform module installation] ***
FATAL: command execution failed
hudson.AbortException: Ansible playbook execution failed
at org.jenkinsci.plugins.ansible.AnsiblePlaybookBuilder.perform(AnsiblePlaybookBuilder.java:262)
Testing TASK [folio-ansible/roles/okapi-tenant-deploy : Post install list for deployment and enabling] ***
fatal: [10.36.1.212]: FAILED! => {"changed": false, "content": "", "msg": "Status code was -1 and not [200]: Connection failure: timed out", "redirected": false, "status": -1, "url": "http://10.36.1.212:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadReference%3Dtrue%2CloadSample%3Dtrue"}
to retry, use: --limit @/home/jenkins/workspace/Automation/folio-testing-backend01/CI/ansible/folio-testing-backend.retry
Environments Affected
It did not seem to affect the folio-snapshot-core build This also affected the Vagrant builds for snapshot and testing backend. The snapshot core backend build failure seems unrelated (but I'm far from sure). Initial Investigation I was unable to discover much from the error messages in the builds except maybe that a long running operation timed out or was aborted. It might happen during the activation of modules that load reference or sample records during tenant activation, based upon this snippet: RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Build tenant parameter list] ***
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadSample', u'value': u'true'}], 'key': u'mod-orders-storage'})
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadReference', u'value': u'true'}], 'key': u'mod-inventory-storage'})
RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Discard duplicates in tenant parameter list] ***
ok: [10.36.1.64]
RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Convert tenant parameter list to query string] ***
ok: [10.36.1.64] => (item={u'name': u'loadSample', u'value': u'true'})
ok: [10.36.1.64] => (item={u'name': u'loadReference', u'value': u'true'})
Folio-snapshot-core seems unaffected. Given that builds differently that might narrow down what could have caused that. I briefly investigated the recent changes to mod-inventory-storage, mod-orders-storage and folio-ansible but didn't spot anything obvious from yesterday that might have caused this. I've repeated the builds to check whether it is transient. |
| Comments |
| Comment by John Malconian [ 13/Feb/19 ] |
|
The last entries in the folio-testing okapi.mod-orders-storage logs: 2019-02-13 12:47:52,106 INFO DockerModuleHandle mod-orders-storage-3.0.0-SNAPSHOT.67 SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder". Not sure if this is meaningful or not as it appears the container is running, but I also note that it is not enabled for the tenant: ubuntu@ip-10-36-1-246:~$ curl 10.36.1.246:9130/_/proxy/tenants/diku/modules , { "id" : "mod-calendar-1.3.0-SNAPSHOT.67" }, { "id" : "mod-circulation-14.2.0-SNAPSHOT.219" }, { "id" : "mod-circulation-storage-6.3.0-SNAPSHOT.140" }, { "id" : "mod-codex-ekb-1.1.1-SNAPSHOT.76" }, { "id" : "mod-codex-inventory-1.5.0-SNAPSHOT.67" }, { "id" : "mod-codex-mux-2.4.0-SNAPSHOT.63" }, { "id" : "mod-configuration-5.0.2-SNAPSHOT.50" }, { "id" : "mod-feesfines-15.2.1-SNAPSHOT.22" }, { "id" : "mod-finance-storage-1.0.2-SNAPSHOT.7" }, { "id" : "mod-graphql-1.1.1000199" }, { "id" : "mod-inventory-11.2.0-SNAPSHOT.128" }, { "id" : "mod-inventory-storage-15.1.0-SNAPSHOT.240" }, { "id" : "mod-kb-ebsco-java-2.0.4-SNAPSHOT.89" }, { "id" : "mod-login-4.7.1-SNAPSHOT.44" }, { "id" : "mod-login-saml-1.2.2-SNAPSHOT.32" }, { "id" : "mod-notes-2.3.0-SNAPSHOT.69" }, { "id" : "mod-notify-2.2.0-SNAPSHOT.69" }, { "id" : "mod-password-validator-1.1.0-SNAPSHOT.17" }, { "id" : "mod-permissions-5.4.1-SNAPSHOT.36" }, { "id" : "mod-tags-0.3.0-SNAPSHOT.30" }, { "id" : "mod-user-import-3.1.1-SNAPSHOT.31" }, { "id" : "mod-users-15.3.1-SNAPSHOT.67" }, { "id" : "mod-users-bl-4.3.3-SNAPSHOT.59" }, { "id" : "mod-vendors-1.1.0-SNAPSHOT.41" }, { "id" : "okapi-2.24.4" }] |
| Comment by John Malconian [ 13/Feb/19 ] |
|
I think those mod-orders-storage errors are a red herring. I see them on folio-snapshot-stable as well. However, I see the following message in the mod-oreders-storage logs on folio-snapshot-stable that I DO NOT see on folio-testing: Feb 12, 2019 4:39:06 AM org.z3950.zing.cql.cql2pgjson.CQL2PgJSON loadDbSchema |
| Comment by John Malconian [ 13/Feb/19 ] |
|
This would seem to correlate with the tenant initialization timeout. Adding Aliaksandr Pautau as a watcher. |
| Comment by Piotr Kalashuk [ 13/Feb/19 ] |
|
Hi John Malconian, Marc Johnson We are looking why it might happen. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Ian Hardy suggested increasing the RAM on the instance based on a similar situation he recently encountered with the vagrant FOLIO image. Attempting a folio-testing build now with 2 x RAM increase. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Build fails in the same place as before even with double the RAM. |
| Comment by Piotr Kalashuk [ 13/Feb/19 ] |
|
Hi John Malconian, We are preparing PR to resolve the issue in scope of
|
| Comment by Piotr Kalashuk [ 13/Feb/19 ] |
|
The PR to resolve the issue has been created. Please see details in
Sorry for inconvenience |
| Comment by Piotr Kalashuk [ 13/Feb/19 ] |
|
Hi John Malconian, The fix has been merged and new version of the mod-orders-storage snapshot available. Thanks in advance! |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Thanks Piotr Kalashuk. Looks like mod-orders-storage initializes properly now. However, the build still fails due to an unrelated dependency error for mod-oai-pmh. 2019-02-13 18:19:00,915 INFO ProxyContext 565514/proxy RES 400 2663us okapi Incompatible version for module mod-source-record-manager-1.1.0-SNAPSHOT.52 interface source-record-storage. Need 1.0. Have 2.0 2019-02-13 18:19:00,915 ERROR HttpResponse HTTP response code=400 msg=Incompatible version for module mod-source-record-manager-1.1.0-SNAPSHOT.52 interface source-record-storage. Need 1.0. Have 2.0 2019-02-13 18:19:01,242 INFO ProxyContext 368076/proxy REQ 10.36.1.104:50728 supertenant POST /_/proxy/modules okapi-2.24.4 2019-02-13 18:19:01,245 INFO ProxyContext 368076/proxy RES 400 2596us okapi Missing dependency: mod-data-import-1.2.0-SNAPSHOT.33 requires source-record-manager: 1.2 2019-02-13 18:19:01,245 ERROR HttpResponse HTTP response code=400 msg=Missing dependency: mod-data-import-1.2.0-SNAPSHOT.33 requires source-record-manager: 1.2 2019-02-13 18:19:01,553 INFO ProxyContext 667523/proxy REQ 10.36.1.104:50730 supertenant POST /_/proxy/modules okapi-2.24.4 2019-02-13 18:19:01,558 INFO ProxyContext 667523/proxy RES 201 4301us okapi-2.24.4 /_/proxy/modules 2019-02-13 18:19:01,855 INFO ProxyContext 504338/proxy REQ 10.36.1.104:50732 supertenant POST /_/proxy/modules okapi-2.24.4 2019-02-13 18:19:01,858 INFO ProxyContext 504338/proxy RES 400 2482us okapi Incompatible version for module mod-oai-pmh-1.0.2-SNAPSHOT.40 interface source-record-storage. Need 1.0. Have 2.0 2019-02-13 18:19:01,858 ERROR HttpResponse HTTP response code=400 msg=Incompatible version for module mod-oai-pmh-1.0.2-SNAPSHOT.40 interface source-record-storage. Need 1.0. Have 2.0 Looks like the mod-source-record-storage endpoints were changed here: https://github.com/folio-org/mod-source-record-storage/pull/26/commits/20b09be484c8985276d6734a068e7bbef058e55d CC: Kateryna Senchenko |
| Comment by Piotr Kalashuk [ 13/Feb/19 ] |
|
As far as I can see the last build has failed however the error is now different. Please let us know if you think this is caused by orders module |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Wayne Schneider Ian Hardy David Crossley - Disabling the stripes-testing job in Jenkins until the dependency issue above is resolved. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Piotr Kalashuk No, latest failure not caused by orders modules. Latest failure is caused by a dependency mismatch between mod-source-record-storage, mod-oai-pmh, and mod-data-import. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Going to attempt a build with a re-arranged module order. I don't think anything requires source-record-storage-2.0. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Yep. Re-arranging the order slightly so that mod-source-record-manager is loaded before mod-source-record-storage resolves the issue. Going forward an interface update to source-record-storage 2.0 will need to be carefully coordinated for mod-oai-pmh, mod-data-import, and mod-source-record-manager. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
folio-snapshot-latest has been rebuilt successfully. |
| Comment by John Malconian [ 13/Feb/19 ] |
|
Spoke too soon. Module re-arranging the module order in folio-testing does NOT work. mod-source-record-manager, mod-oai-pmh, and mod-data-import will need update their dependencies to require the new interface provided by mod-source-record-storage. |
| Comment by John Malconian [ 14/Feb/19 ] |
|
Conferred with Craig McNally about this and my understanding is that the immediate solution will be to reverse the mod-source-record-storage interface upgrade until interface updates for all dependent modules (mod-data-import, mod-source-record-manager, and mod-oah-pmh) are coordinated. |
| Comment by Craig McNally [ 14/Feb/19 ] |
|
John Malconian Sounds like the EPAM teams have decided to take a slightly different approach.
Once everything is merged the dependency issues should be resolved and folijet will ask for the folio-testing build to be kicked off. The functional changes to mod-oai-pmh have not been made yet. These will be made next week. It's not a rush as I don't think mod-oai-pmh is actively being used/tested in folio-testing. It's unclear if functional changes to the other two modules will also need to be made or if the recently merged/soon to be merged changes address those as well as bumping the interface version. Taras Spashchenko can probably answer that. |
| Comment by Ann-Marie Breaux (Inactive) [ 14/Feb/19 ] |
|
The Data Import/Folijet stories that should resolve the folio-testing build problem are
|
| Comment by Ann-Marie Breaux (Inactive) [ 14/Feb/19 ] |
|
Taras Spashchenko confirmed all should be fine from the MODSOURMAN and MODDATAIMP POV. Asked Wayne Schneider to kick off a rebuild of folio-testing. |
| Comment by Wayne Schneider [ 15/Feb/19 ] |
|
Rebuild completed successfully. Closing issue. |