[FOLIO-1804] Reference environment builds fail due to apparent timeout Created: 13/Feb/19  Updated: 17/Mar/21  Resolved: 15/Feb/19

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Bug Priority: P1
Reporter: Marc Johnson Assignee: John Malconian
Resolution: Done Votes: 0
Labels: ci, environments, sprint57
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Issue links:
Blocks
is blocked by MODORDSTOR-57 Loading reference records fails Closed
is blocked by MODDATAIMP-81 Bring all the module's endpoints to t... Closed
is blocked by MODSOURMAN-61 Bring all the module's endpoints to t... Closed
Relates
relates to MODOAIPMH-66 Upgrade to source-record-storage 2.0 ... Closed
Sprint:
Development Team: Core: Platform

 Description   

Apologies for the rubbish title, please amend as we know more information.

Symptoms

Snapshot

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Build tenant parameter list] ***
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadSample', u'value': u'true'}], 'key': u'mod-orders-storage'})
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadReference', u'value': u'true'}], 'key': u'mod-inventory-storage'})

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Discard duplicates in tenant parameter list] ***
ok: [10.36.1.64]

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Convert tenant parameter list to query string] ***
ok: [10.36.1.64] => (item={u'name': u'loadSample', u'value': u'true'})
ok: [10.36.1.64] => (item={u'name': u'loadReference', u'value': u'true'})

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Perform module installation] ***
FATAL: command execution failed
hudson.AbortException: Ansible playbook execution failed
	at org.jenkinsci.plugins.ansible.AnsiblePlaybookBuilder.perform(AnsiblePlaybookBuilder.java:262)

Testing

TASK [folio-ansible/roles/okapi-tenant-deploy : Post install list for deployment and enabling] ***
fatal: [10.36.1.212]: FAILED! => {"changed": false, "content": "", "msg": "Status code was -1 and not [200]: Connection failure: timed out", "redirected": false, "status": -1, "url": "http://10.36.1.212:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadReference%3Dtrue%2CloadSample%3Dtrue"}
	to retry, use: --limit @/home/jenkins/workspace/Automation/folio-testing-backend01/CI/ansible/folio-testing-backend.retry

Environments Affected

It did not seem to affect the folio-snapshot-core build

This also affected the Vagrant builds for snapshot and testing backend. The snapshot core backend build failure seems unrelated (but I'm far from sure).

Initial Investigation

I was unable to discover much from the error messages in the builds except maybe that a long running operation timed out or was aborted.

It might happen during the activation of modules that load reference or sample records during tenant activation, based upon this snippet:

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Build tenant parameter list] ***
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadSample', u'value': u'true'}], 'key': u'mod-orders-storage'})
ok: [10.36.1.64] => (item={'value': [{u'name': u'loadReference', u'value': u'true'}], 'key': u'mod-inventory-storage'})

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Discard duplicates in tenant parameter list] ***
ok: [10.36.1.64]

RUNNING HANDLER [folio-ansible/roles/okapi-deploy-config : Convert tenant parameter list to query string] ***
ok: [10.36.1.64] => (item={u'name': u'loadSample', u'value': u'true'})
ok: [10.36.1.64] => (item={u'name': u'loadReference', u'value': u'true'})

Folio-snapshot-core seems unaffected. Given that builds differently that might narrow down what could have caused that.

I briefly investigated the recent changes to mod-inventory-storage, mod-orders-storage and folio-ansible but didn't spot anything obvious from yesterday that might have caused this.

I've repeated the builds to check whether it is transient.



 Comments   
Comment by John Malconian [ 13/Feb/19 ]

The last entries in the folio-testing okapi.mod-orders-storage logs:

2019-02-13 12:47:52,106 INFO DockerModuleHandle mod-orders-storage-3.0.0-SNAPSHOT.67 SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
2019-02-13 12:47:52,106 INFO DockerModuleHandle mod-orders-storage-3.0.0-SNAPSHOT.67 SLF4J: Defaulting to no-operation MDCAdapter implementation.
2019-02-13 12:47:52,106 INFO DockerModuleHandle mod-orders-storage-3.0.0-SNAPSHOT.67 SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for further details.

Not sure if this is meaningful or not as it appears the container is running, but I also note that it is not enabled for the tenant:

ubuntu@ip-10-36-1-246:~$ curl 10.36.1.246:9130/_/proxy/tenants/diku/modules
[

{ "id" : "mod-authtoken-2.0.5-SNAPSHOT.42" }

,

{ "id" : "mod-calendar-1.3.0-SNAPSHOT.67" }

,

{ "id" : "mod-circulation-14.2.0-SNAPSHOT.219" }

,

{ "id" : "mod-circulation-storage-6.3.0-SNAPSHOT.140" }

,

{ "id" : "mod-codex-ekb-1.1.1-SNAPSHOT.76" }

,

{ "id" : "mod-codex-inventory-1.5.0-SNAPSHOT.67" }

,

{ "id" : "mod-codex-mux-2.4.0-SNAPSHOT.63" }

,

{ "id" : "mod-configuration-5.0.2-SNAPSHOT.50" }

,

{ "id" : "mod-feesfines-15.2.1-SNAPSHOT.22" }

,

{ "id" : "mod-finance-storage-1.0.2-SNAPSHOT.7" }

,

{ "id" : "mod-graphql-1.1.1000199" }

,

{ "id" : "mod-inventory-11.2.0-SNAPSHOT.128" }

,

{ "id" : "mod-inventory-storage-15.1.0-SNAPSHOT.240" }

,

{ "id" : "mod-kb-ebsco-java-2.0.4-SNAPSHOT.89" }

,

{ "id" : "mod-login-4.7.1-SNAPSHOT.44" }

,

{ "id" : "mod-login-saml-1.2.2-SNAPSHOT.32" }

,

{ "id" : "mod-notes-2.3.0-SNAPSHOT.69" }

,

{ "id" : "mod-notify-2.2.0-SNAPSHOT.69" }

,

{ "id" : "mod-password-validator-1.1.0-SNAPSHOT.17" }

,

{ "id" : "mod-permissions-5.4.1-SNAPSHOT.36" }

,

{ "id" : "mod-tags-0.3.0-SNAPSHOT.30" }

,

{ "id" : "mod-user-import-3.1.1-SNAPSHOT.31" }

,

{ "id" : "mod-users-15.3.1-SNAPSHOT.67" }

,

{ "id" : "mod-users-bl-4.3.3-SNAPSHOT.59" }

,

{ "id" : "mod-vendors-1.1.0-SNAPSHOT.41" }

,

{ "id" : "okapi-2.24.4" }

]

Comment by John Malconian [ 13/Feb/19 ]

I think those mod-orders-storage errors are a red herring. I see them on folio-snapshot-stable as well. However, I see the following message in the mod-oreders-storage logs on folio-snapshot-stable that I DO NOT see on folio-testing:

Feb 12, 2019 4:39:06 AM org.z3950.zing.cql.cql2pgjson.CQL2PgJSON loadDbSchema
INFO: loadDbSchema: Loaded 'templates/db_scripts/schema.json' OK

Comment by John Malconian [ 13/Feb/19 ]

This would seem to correlate with the tenant initialization timeout. Adding Aliaksandr Pautau as a watcher.

Comment by Piotr Kalashuk [ 13/Feb/19 ]

Hi John Malconian, Marc Johnson

We are looking why it might happen.

Comment by John Malconian [ 13/Feb/19 ]

Ian Hardy suggested increasing the RAM on the instance based on a similar situation he recently encountered with the vagrant FOLIO image. Attempting a folio-testing build now with 2 x RAM increase.

Comment by John Malconian [ 13/Feb/19 ]

Build fails in the same place as before even with double the RAM.

Comment by Piotr Kalashuk [ 13/Feb/19 ]

Hi John Malconian,

We are preparing PR to resolve the issue in scope of MODORDSTOR-57 Closed . As a temporal solution the loadSample property can be changed to false for mod-orders-storage.

Comment by Piotr Kalashuk [ 13/Feb/19 ]

The PR to resolve the issue has been created. Please see details in MODORDSTOR-57 Closed .

Sorry for inconvenience

Comment by Piotr Kalashuk [ 13/Feb/19 ]

Hi John Malconian,

The fix has been merged and new version of the mod-orders-storage snapshot available.
Could you please run folio-testing-backend to confirm that the issue is fully resolved?

Thanks in advance!

Comment by John Malconian [ 13/Feb/19 ]

Thanks Piotr Kalashuk. Looks like mod-orders-storage initializes properly now.

However, the build still fails due to an unrelated dependency error for mod-oai-pmh.

2019-02-13 18:19:00,915 INFO  ProxyContext         565514/proxy RES 400 2663us okapi Incompatible version for module mod-source-record-manager-1.1.0-SNAPSHOT.52 interface source-record-storage. Need 1.0. Have 2.0
2019-02-13 18:19:00,915 ERROR HttpResponse         HTTP response code=400 msg=Incompatible version for module mod-source-record-manager-1.1.0-SNAPSHOT.52 interface source-record-storage. Need 1.0. Have 2.0
2019-02-13 18:19:01,242 INFO  ProxyContext         368076/proxy REQ 10.36.1.104:50728 supertenant POST /_/proxy/modules okapi-2.24.4
2019-02-13 18:19:01,245 INFO  ProxyContext         368076/proxy RES 400 2596us okapi Missing dependency: mod-data-import-1.2.0-SNAPSHOT.33 requires source-record-manager: 1.2
2019-02-13 18:19:01,245 ERROR HttpResponse         HTTP response code=400 msg=Missing dependency: mod-data-import-1.2.0-SNAPSHOT.33 requires source-record-manager: 1.2
2019-02-13 18:19:01,553 INFO  ProxyContext         667523/proxy REQ 10.36.1.104:50730 supertenant POST /_/proxy/modules okapi-2.24.4
2019-02-13 18:19:01,558 INFO  ProxyContext         667523/proxy RES 201 4301us okapi-2.24.4 /_/proxy/modules
2019-02-13 18:19:01,855 INFO  ProxyContext         504338/proxy REQ 10.36.1.104:50732 supertenant POST /_/proxy/modules okapi-2.24.4
2019-02-13 18:19:01,858 INFO  ProxyContext         504338/proxy RES 400 2482us okapi Incompatible version for module mod-oai-pmh-1.0.2-SNAPSHOT.40 interface source-record-storage. Need 1.0. Have 2.0
2019-02-13 18:19:01,858 ERROR HttpResponse         HTTP response code=400 msg=Incompatible version for module mod-oai-pmh-1.0.2-SNAPSHOT.40 interface source-record-storage. Need 1.0. Have 2.0

Looks like the mod-source-record-storage endpoints were changed here: https://github.com/folio-org/mod-source-record-storage/pull/26/commits/20b09be484c8985276d6734a068e7bbef058e55d

CC: Kateryna Senchenko
CC: Craig McNally

Comment by Piotr Kalashuk [ 13/Feb/19 ]

As far as I can see the last build has failed however the error is now different. Please let us know if you think this is caused by orders module

Comment by John Malconian [ 13/Feb/19 ]

Wayne Schneider Ian Hardy David Crossley - Disabling the stripes-testing job in Jenkins until the dependency issue above is resolved.

Comment by John Malconian [ 13/Feb/19 ]

Piotr Kalashuk No, latest failure not caused by orders modules. Latest failure is caused by a dependency mismatch between mod-source-record-storage, mod-oai-pmh, and mod-data-import.

Comment by John Malconian [ 13/Feb/19 ]

Going to attempt a build with a re-arranged module order. I don't think anything requires source-record-storage-2.0.

Comment by John Malconian [ 13/Feb/19 ]

Yep. Re-arranging the order slightly so that mod-source-record-manager is loaded before mod-source-record-storage resolves the issue. Going forward an interface update to source-record-storage 2.0 will need to be carefully coordinated for mod-oai-pmh, mod-data-import, and mod-source-record-manager.

Comment by John Malconian [ 13/Feb/19 ]

folio-snapshot-latest has been rebuilt successfully.

Comment by John Malconian [ 13/Feb/19 ]

Spoke too soon. Module re-arranging the module order in folio-testing does NOT work. mod-source-record-manager, mod-oai-pmh, and mod-data-import will need update their dependencies to require the new interface provided by mod-source-record-storage.

Comment by John Malconian [ 14/Feb/19 ]

Conferred with Craig McNally about this and my understanding is that the immediate solution will be to reverse the mod-source-record-storage interface upgrade until interface updates for all dependent modules (mod-data-import, mod-source-record-manager, and mod-oah-pmh) are coordinated.

Comment by Craig McNally [ 14/Feb/19 ]

John Malconian Sounds like the EPAM teams have decided to take a slightly different approach.

  • mod-oai-pmh's dependency version has been bumped to 2.0 and merged.
  • mod-source-record-manager has been updated and changes have been merged.
  • mod-data-import has been updated and changes should be merged soon.

Once everything is merged the dependency issues should be resolved and folijet will ask for the folio-testing build to be kicked off.

The functional changes to mod-oai-pmh have not been made yet. These will be made next week. It's not a rush as I don't think mod-oai-pmh is actively being used/tested in folio-testing.

It's unclear if functional changes to the other two modules will also need to be made or if the recently merged/soon to be merged changes address those as well as bumping the interface version. Taras Spashchenko can probably answer that.

Comment by Ann-Marie Breaux (Inactive) [ 14/Feb/19 ]

The Data Import/Folijet stories that should resolve the folio-testing build problem are MODSOURMAN-61 Closed and MODDATAIMP-81 Closed . These stories are now linked to FOLIO-1804 Closed , so that folks can monitor. As soon as the two Folijet stories are merged, we'll ask for folio-testing to be rebuilt. And hopefully all will be well.

Comment by Ann-Marie Breaux (Inactive) [ 14/Feb/19 ]

Taras Spashchenko confirmed all should be fine from the MODSOURMAN and MODDATAIMP POV. Asked Wayne Schneider to kick off a rebuild of folio-testing.

Comment by Wayne Schneider [ 15/Feb/19 ]

Rebuild completed successfully. Closing issue.

Generated at Thu Feb 08 23:16:02 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.