[MODRS-79] mod-remote-storage unavailable in snapshot environments Created: 01/Feb/21 Updated: 30/Jul/21 Resolved: 30/Jul/21 |
|
| Status: | Closed |
| Project: | mod-remote-storage |
| Components: | None |
| Affects versions: | None |
| Fix versions: | 1.4.0 |
| Type: | Task | Priority: | TBD |
| Reporter: | Marc Johnson | Assignee: | Siarhei Charniak |
| Resolution: | Done | Votes: | 0 |
| Labels: | environments, outage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue links: |
|
||||||||||||||||||||||||
| Sprint: | Firebird Sprint 119 | ||||||||||||||||||||||||
| Story Points: | 3 | ||||||||||||||||||||||||
| Development Team: | Firebird | ||||||||||||||||||||||||
| Description |
|
Ann-Marie Breaux and Owen Stephens reported this morning that the snapshot environment is unavailable. SymptomsThe latest build reports the following error:
fatal: [10.36.1.36]: FAILED! => {"changed": false, "connection": "close", "content": "POST request for mod-remote-storage-1.1.0-SNAPSHOT.49 /_/tenant failed with 500: {\"timestamp\":\"2021-02-01T01:44:25.267+00:00\",\"status\":500,\"error\":\"Internal Server Error\",\"message\":\"\",\"path\":\"/_/tenant\"}", "content_length": "203", "content_type": "text/plain", "elapsed": 117, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "redirected": false, "status": 400, "url": "http://10.36.1.36:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadSample%3Dtrue%2CloadReference%3Dtrue", "vary": "origin"}
This suggests that mod-remote-storage fails to be enabled for the diku tenant. Open QuestionsWhy does the folio-testing build succeed with the same version of mod-remote-storage? Other Affected Environments |
| Comments |
| Comment by Marc Johnson [ 01/Feb/21 ] |
|
Viachaslau Khandramai Do you know what might have caused this (it seems to be a similar error to
|
| Comment by David Crossley [ 01/Feb/21 ] |
|
I attached docker logs fs-786-mod-remote-storage.txt from today folio-snapshot build 786. |
| Comment by Viachaslau Khandramai (Inactive) [ 01/Feb/21 ] |
|
Hi Marc Johnson, investigating now. But looks strange because folio-testing works as expected. |
| Comment by Viachaslau Khandramai (Inactive) [ 01/Feb/21 ] |
|
Could someone please provide the same docker logs for folio-testing? I saw from the logs that system-user we used for internal operations log in was failed despite this user was successfully created: 10:20:22.912 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/users?query=username==system-user for tenant: [diku] 10:20:22.924 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Response status: 200 10:20:22.939 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/users for tenant: [diku] 10:20:22.954 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Response status: 201 10:20:22.956 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/authn/credentials for tenant: [diku] 10:20:22.986 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Response status: 201 10:20:22.986 [http-nio-8081-exec-1] INFO org.folio.rs.service.SecurityManagerService - Saved credentials for user: [SystemUserParameters(id=3e47312e-7718-4620-a513-ce7ab1fa2688, username=system-user, password=system-user, okapiToken=null, okapiUrl=http://10.36.1.79:9130, tenantId=diku)] 10:20:23.011 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/perms/users for tenant: [diku] 10:20:23.021 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Response status: 201 10:20:23.026 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/authn/login for tenant: [diku] 10:20:23.031 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Response status: 400 |
| Comment by Marc Johnson [ 01/Feb/21 ] |
Agreed. I imagine something is set up slightly differently in the two environments. |
| Comment by Viachaslau Khandramai (Inactive) [ 01/Feb/21 ] |
|
Thanks Former user for logs! From what I see: 13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] ---> POST http://authn/login HTTP/1.1 13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] Content-Length: 51 13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] Content-Type: application/json 13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] 13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] {"username":"system-user","password":"system-user"} 13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] ---> END HTTP (51-byte body) 13:28:34.102 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.186:9130/authn/login for tenant: [diku] 13:28:34.107 [http-nio-8081-exec-1] INFO org.folio.rs.client.EnrichHeadersClient - Response status: 400 13:28:34.107 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] <--- HTTP/1.1 400 Bad Request (5ms) 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] content-type: text/plain 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] transfer-encoding: chunked 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] vary: origin 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] x-okapi-trace: POST mod-login-7.3.0-SNAPSHOT.91 http://10.36.1.186:9135/authn/login : 400 2998us 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] Missing Okapi token header 13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] <--- END HTTP (26-byte body) Looks like the problem comes from necessity of x-okapi-token on the snapshot env during POST http://authn/login. Looks strange. I think we don't need to have this header during log in process. Moreover, this works on testing env with the same mod-login artifact. |
| Comment by John Malconian [ 01/Feb/21 ] |
|
I'm now get the following error when attempting to build folio-snapshot-load: atal: [10.36.1.238]: FAILED! => {"changed": false, "connection": "close", "content": "Unexpected Location header in response for POST /_/tenant", "content_length": "57", "content_type": "text/plain", "elapsed": 30, "msg": "Status code was 500 and not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "http://10.36.1.238:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadSample%3Dtrue%2CloadReference%3Dtrue", "vary": "origin"}
The Okapi log shows this: 021-02-01T18:25:46,844 INFO DockerModuleHandle mod-permissions-5.13.0-SNAPSHOT.97 18:25:46 [] [] [] [] INFO PostgresClient Successfully executed CREATE TABLE IF NOT EXISTS diku_mod_permissions.rmb_job ( id UUID PRIMARY KEY, jsonb JSONB NOT NULL ); 2021-02-01T18:25:46,895 INFO DockerModuleHandle mod-permissions-5.13.0-SNAPSHOT.97 18:25:46 [] [] [] [] INFO LogUtil org.folio.rest.RestVerticle start invoking postTenant 2021-02-01T18:25:46,906 INFO OkapiClient 139954/proxy;242110/tenant RES 201 898349us okapiClient http://10.36.1.238:9136/_/tenant 2021-02-01T18:25:46,908 INFO TenantManager job complete 2021-02-01T18:25:46,908 WARN TenantManager job failed io.vertx.core.impl.NoStackTraceThrowable: Unexpected Location header in response for POST /_/tenant 2021-02-01T18:25:46,909 WARN ProxyContext Unexpected Location header in response for POST /_/tenant io.vertx.core.impl.NoStackTraceThrowable: Unexpected Location header in respoor POST /_/tenant 2021-02-01T18:25:46,909 INFO ProxyContext 139954/proxy RES 500 30679795us okapi Unexpected Location header in response for POST /_/tenant 2021-02-01T18:25:46,910 ERROR HttpResponse HTTP response code=500 msg=Unexpected Location header in response for POST /_/tenant 2021-02-01T18:25:46,911 INFO DockerModuleHandle mod-permissions-5.13.0-SNAPSHOT.97 18:25:46 [] [] [] [] INFO LogUtil 10.36.1.238:40314 POST /_/tenant null HTTP_1_1 201 345 797 tid=diku Created Also weird: ubuntu@ip-10-36-1-238:/var/log/folio/okapi$ grep "139954" okapi.log |
| Comment by Adam Dickmeiss [ 01/Feb/21 ] |
|
See
|
| Comment by John Malconian [ 01/Feb/21 ] |
|
After applying the mod-permissions update, we are now looking at a new, unrelated error. This time from mod-source-record-storage: 22:57:10.283 [vert.x-worker-thread-0] WARN ClientUtils [3185eqId] Couldn't resolve server kafka:9092 from bootstrap.servers as DNS resolution failed for kafka 22:57:10.283 [vert.x-worker-thread-1] WARN ClientUtils [3185eqId] Couldn't resolve server kafka:9092 from bootstrap.servers as DNS resolution failed for kafka 22:57:10.289 [vert.x-eventloop-thread-0] ERROR RestVerticle [3191eqId] During verticle deployment, init failed, exiting....... Failed to construct kafka consumer io.vertx.core.impl.NoStackTraceThrowable: Failed to construct kafka consumer 22:57:10.290 [vert.x-eventloop-thread-1] ERROR ertxIsolatedDeployer [3192eqId] Failed in deploying verticle io.vertx.core.impl.NoStackTraceThrowable: Failed to construct kafka consumer io.vertx.core.impl.NoStackTraceThrowable: Failed to construct kafka consumer This is a fatal error. The module container crashes. |
| Comment by Ian Hardy [ 01/Feb/21 ] |
|
Looks like srs now needs to talk to kafka directly. I guess I'll try setting: https://github.com/folio-org/mod-source-record-storage/commit/ef7e23ff9433bf9af115e271a5a65f5705dd25e0#diff-00d3f6d8ea6dc6833c711e102774e1bc41ddc19f13beda8d773b38f6e147a74aR22 Kateryna Senchenko will you update mod-srs's module descriptor with any new required environment variables? |
| Comment by Ian Hardy [ 01/Feb/21 ] |
|
Or, better yet, use "KAFKA_PORT" and "KAFKA_HOST" environment variables. These are the variable names that pubsub and mod-inventory-storage are using |
| Comment by Ian Hardy [ 01/Feb/21 ] |
|
-I set FOLIO_KAKFA_HOST on srs to the correct host ("KAFKA_HOST=10.36.1.155" where the IP derived from ansible), but SRS still gets a fatal error: Probably because the kafka host is not set properly. - My mistake, I set the variable wrong, now that its set right, the snapshot-load build passed. I'd still like the SRS devs to consider using the environment variables KAKFA_HOST and KAFKA_PORT instead since pubsub and inventory-storage use those. |
| Comment by Kateryna Senchenko [ 02/Feb/21 ] |
|
Hi everyone, Data-import now goes directly via kafka. I created PRs to correct the required env names https://github.com/folio-org/mod-inventory/pull/299 We'll make sure to update the documentation before the release |
| Comment by Kateryna Senchenko [ 02/Feb/21 ] |
|
Recent logs from folio-snapshot-load indicate that mod-inventory failed to create Kafka Consumers due to missing ENV variables. After Ian Hardy set FOLIO_KAFKA_HOST and _PORT variables - it resolved the issues for mod-data-import, mod-source-record-manager and mod-source-record-storage, but not for mod-inventory apparently. I created the PRs to correct the required variables names, but still not sure why mod-inventory didn't pick up the env variables like other modules 02 Feb 2021 02:19:30:247 WARN ClientUtils [] Couldn't resolve server kafka:9092 from bootstrap.servers as DNS resolution failed for kafka Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) at org.folio.inventory.Launcher.startConsumerVerticles(Launcher.java:84) at org.folio.inventory.Launcher.main(Launcher.java:52) Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:825) at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631) at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:612) at io.vertx.kafka.client.consumer.KafkaReadStream.create(KafkaReadStream.java:113) at io.vertx.kafka.client.consumer.KafkaConsumer.create(KafkaConsumer.java:75) at org.folio.kafka.KafkaConsumerWrapper.start(KafkaConsumerWrapper.java:109) at org.folio.inventory.DataImportConsumerVerticle.createKafkaConsumerWrapper(DataImportConsumerVerticle.java:119) at org.folio.inventory.DataImportConsumerVerticle.lambda$start$0(DataImportConsumerVerticle.java:85) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:720) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) at org.folio.inventory.DataImportConsumerVerticle.start(DataImportConsumerVerticle.java:86) at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$9(DeploymentManager.java:556) at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366) at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35) at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:89) at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:48) at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:737) ... 23 more |
| Comment by Marc Johnson [ 02/Feb/21 ] |
Does this pull request need merging and the environments rebuilding in order for data import to work? |
| Comment by Kateryna Senchenko [ 02/Feb/21 ] |
|
Hi Marc Johnson, It was suggested by Ian Hardy to use KAFKA_HOST and KAFKA_PORT environment variables - the PRs contain that change. But I'm still trying to figure out why mod-inventory didn't pick up FOLIO_KAFKA_HOST and FOLIO_KAFKA_PORT after Ian set these variables as well. |
| Comment by Marc Johnson [ 02/Feb/21 ] |
I suggest we move forward with using the standard environment variables. Ian Hardy I imagine the use of FOLIO_KAFKA_HOST and FOLIO_KAFKA_PORT was an attempt at a temporary fix. |
| Comment by Kateryna Senchenko [ 02/Feb/21 ] |
|
OK, I'll merge the PRs then and we'll see how it goes |
| Comment by Ian Hardy [ 02/Feb/21 ] |
|
Yes, that was temporary, Thanks Kateryna Senchenko for setting the standard vars. I've removed the FOLIO_KAKFA_* settings from our ansible playbook. |
| Comment by Aliaksei Harbuz [ 19/Jul/21 ] |
|
The issue related system user initialization(missing token header) at snapshot env exists not only in mod-remote-storage module but also at mod-search module: Method [public void org.folio.search.integration.KafkaMessageListener.handleEvents(java.util.List<org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, org.folio.search.domain.dto.ResourceEventBody>>)] Bean [org.folio.search.integration.KafkaMessageListener@62ccbd95]; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void org.folio.search.integration.KafkaMessageListener.handleEvents(java.util.List<org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, org.folio.search.domain.dto.ResourceEventBody>>)' threw exception; nested exception is feign.FeignException$BadRequest: [400 Bad Request] during [POST] to [http://authn/login] [AuthnClient#getApiKey(UserCredentials)]: [Missing Okapi token header]; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void org.folio.search.integration.KafkaMessageListener.handleEvents(java.util.List<org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, org.folio.search.domain.dto.ResourceEventBody>>)' threw exception; nested exception is feign.FeignException$BadRequest: [400 Bad Request] during [POST] to [http://authn/login] [AuthnClient#getApiKey(UserCredentials)]: [Missing Okapi token header] |
| Comment by Wayne Schneider [ 21/Jul/21 ] |
|
Aliaksei Harbuz It may have to do with the order in which modules are enabled. I note that in the module descriptor for mod-remote-storage, the authtoken interface is not required, so there is no guarantee that it will be present to provide a token when the module is initialized. See also
Before you jump to the conclusion that you actually require this interface, please think hard about whether your module should be able to operate in an unlocked system (i.e., a system without the authtoken interface). Most modules use module permissions to assign appropriate permissions to endpoints that require them, rather than provisioning users and logging them in. Please review the Okapi security guide to get more information about how permissions are used in an Okapi system. |
| Comment by Aliaksei Harbuz [ 23/Jul/21 ] |
|
Hi Wayne Schneider, the module descriptor of mod-remote-storage was updated by adding authtoken interface v.2.0. But the issue with system user initialization still exists. It was found additional logs from okapi at same time when errors were logged by mod-remote-storage:
01:52:12 [] [] [] [] INFO DockerModuleHandle mod-login-7.5.0-SNAPSHOT.98 01:52:12 [] [] [] [] INFO RestRouting invoking postAuthnLogin
01:52:12 [] [] [] [] INFO DockerModuleHandle mod-login-7.5.0-SNAPSHOT.98 01:52:12 [] [] [] [] ERROR LoginAPI Missing request token
01:52:12 [] [] [] [] INFO DockerModuleHandle mod-login-7.5.0-SNAPSHOT.98 01:52:12 [] [] [] [] INFO LogUtil 10.36.1.66:57604 POST /authn/login null HTTP_1_1 400 26 1 tid=diku Bad Request
Updates: According Jenkins logs for snapshot:
ok: [10.36.1.66] => (item={'id': 'mod-remote-storage-1.3.0', 'action': 'enable'})
Interfaces updates may not apply. Changes will be double checked after versions fixes and rebuild. |
| Comment by Aliaksei Harbuz [ 26/Jul/21 ] |
|
System user initialization's issue was fixed by interfaces updates :
01:52:06 [] [] [] [] INFO PubSubService Module's subscribers were successfully registered
01:52:06 [] [] [] [] INFO curityManagerService Saved credentials for user: [system-user]
01:52:50 [] [] [] [] INFO KafkaService Restarting kafka consumer to start listening created topics [id: mod-remote-storage-events-listener]
01
|
| Comment by Siarhei Charniak [ 30/Jul/21 ] |
|
Double check logs:
01:52:35 [] [] [] [] INFO curityManagerService Saved credentials for user: [system-user]
01:52:35 [] [] [] [] INFO PubSubService Module's publishers were successfully registered
01:52:35 [] [] [] [] INFO PubSubService Module's subscribers were successfully registered
01:53:13 [] [] [] [] INFO KafkaService Restarting kafka consumer to start listening created topics [id: mod-remote-storage-events-listener]
System user initialization works as expected, the story can be closed as done. |