[MODRS-79] mod-remote-storage unavailable in snapshot environments Created: 01/Feb/21  Updated: 30/Jul/21  Resolved: 30/Jul/21

Status: Closed
Project: mod-remote-storage
Components: None
Affects versions: None
Fix versions: 1.4.0

Type: Task Priority: TBD
Reporter: Marc Johnson Assignee: Siarhei Charniak
Resolution: Done Votes: 0
Labels: environments, outage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: Text File fs-786-mod-remote-storage.txt     File remote-storage-52.log    
Issue links:
Blocks
is blocked by MODPERMS-128 Wrong _tenant version in module descr... Closed
Defines
defines UXPROD-3072 Firebird - R3 Enhancements/Bugfixes/T... Closed
Relates
relates to MSEARCH-83 mod-search breaks if authtoken module... Closed
Sprint: Firebird Sprint 119
Story Points: 3
Development Team: Firebird

 Description   

Ann-Marie Breaux and Owen Stephens reported this morning that the snapshot environment is unavailable.

Symptoms

The latest build reports the following error:

fatal: [10.36.1.36]: FAILED! => {"changed": false, "connection": "close", "content": "POST request for mod-remote-storage-1.1.0-SNAPSHOT.49 /_/tenant failed with 500: {\"timestamp\":\"2021-02-01T01:44:25.267+00:00\",\"status\":500,\"error\":\"Internal Server Error\",\"message\":\"\",\"path\":\"/_/tenant\"}", "content_length": "203", "content_type": "text/plain", "elapsed": 117, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "redirected": false, "status": 400, "url": "http://10.36.1.36:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadSample%3Dtrue%2CloadReference%3Dtrue", "vary": "origin"}

This suggests that mod-remote-storage fails to be enabled for the diku tenant.

Open Questions

Why does the folio-testing build succeed with the same version of mod-remote-storage?

Other Affected Environments



 Comments   
Comment by Marc Johnson [ 01/Feb/21 ]

Viachaslau Khandramai Do you know what might have caused this (it seems to be a similar error to FOLIO-2983 Closed )?

Comment by David Crossley [ 01/Feb/21 ]

I attached docker logs fs-786-mod-remote-storage.txt from today folio-snapshot build 786.

Comment by Viachaslau Khandramai (Inactive) [ 01/Feb/21 ]

Hi Marc Johnson, investigating now. But looks strange because folio-testing works as expected.

Comment by Viachaslau Khandramai (Inactive) [ 01/Feb/21 ]

Could someone please provide the same docker logs for folio-testing? I saw from the logs that system-user we used for internal operations log in was failed despite this user was successfully created:

10:20:22.912 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/users?query=username==system-user for tenant: [diku]
10:20:22.924 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Response status: 200
10:20:22.939 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/users for tenant: [diku]
10:20:22.954 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Response status: 201
10:20:22.956 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/authn/credentials for tenant: [diku]
10:20:22.986 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Response status: 201
10:20:22.986 [http-nio-8081-exec-1] INFO  org.folio.rs.service.SecurityManagerService - Saved credentials for user: [SystemUserParameters(id=3e47312e-7718-4620-a513-ce7ab1fa2688, username=system-user, password=system-user, okapiToken=null, okapiUrl=http://10.36.1.79:9130, tenantId=diku)]
10:20:23.011 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/perms/users for tenant: [diku]
10:20:23.021 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Response status: 201
10:20:23.026 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.79:9130/authn/login for tenant: [diku]
10:20:23.031 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Response status: 400
Comment by Marc Johnson [ 01/Feb/21 ]

Viachaslau Khandramai

But looks strange because folio-testing works as expected.

Agreed. I imagine something is set up slightly differently in the two environments.

Comment by Viachaslau Khandramai (Inactive) [ 01/Feb/21 ]

Thanks Former user for logs!

From what I see:

13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] ---> POST http://authn/login HTTP/1.1
13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] Content-Length: 51
13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] Content-Type: application/json
13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] 
13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] {"username":"system-user","password":"system-user"}
13:28:34.102 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] ---> END HTTP (51-byte body)
13:28:34.102 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Sending request to http://10.36.1.186:9130/authn/login for tenant: [diku]
13:28:34.107 [http-nio-8081-exec-1] INFO  org.folio.rs.client.EnrichHeadersClient - Response status: 400
13:28:34.107 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] <--- HTTP/1.1 400 Bad Request (5ms)
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] content-type: text/plain
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] transfer-encoding: chunked
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] vary: origin
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] x-okapi-trace: POST mod-login-7.3.0-SNAPSHOT.91 http://10.36.1.186:9135/authn/login : 400 2998us
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] 
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] Missing Okapi token header
13:28:34.108 [http-nio-8081-exec-1] DEBUG org.folio.rs.client.AuthnClient - [AuthnClient#getApiKey] <--- END HTTP (26-byte body)

Looks like the problem comes from necessity of x-okapi-token on the snapshot env during POST http://authn/login. Looks strange. I think we don't need to have this header during log in process. Moreover, this works on testing env with the same mod-login artifact.

Comment by John Malconian [ 01/Feb/21 ]

I'm now get the following error when attempting to build folio-snapshot-load:

atal: [10.36.1.238]: FAILED! => {"changed": false, "connection": "close", "content": "Unexpected Location header in response for POST /_/tenant", "content_length": "57", "content_type": "text/plain", "elapsed": 30, "msg": "Status code was 500 and not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "http://10.36.1.238:9130/_/proxy/tenants/diku/install?deploy=true&tenantParameters=loadSample%3Dtrue%2CloadReference%3Dtrue", "vary": "origin"}

The Okapi log shows this:

021-02-01T18:25:46,844 INFO  DockerModuleHandle   mod-permissions-5.13.0-SNAPSHOT.97 18:25:46 [] [] [] [] INFO  PostgresClient       Successfully executed  CREATE TABLE IF NOT EXISTS diku_mod_permissions.rmb_job (       id UUID PRIMARY KEY,       jsonb JSONB NOT NULL     );
2021-02-01T18:25:46,895 INFO  DockerModuleHandle   mod-permissions-5.13.0-SNAPSHOT.97 18:25:46 [] [] [] [] INFO  LogUtil              org.folio.rest.RestVerticle start  invoking postTenant
2021-02-01T18:25:46,906 INFO  OkapiClient          139954/proxy;242110/tenant RES 201 898349us okapiClient http://10.36.1.238:9136/_/tenant
2021-02-01T18:25:46,908 INFO  TenantManager        job complete
2021-02-01T18:25:46,908 WARN  TenantManager        job failed
io.vertx.core.impl.NoStackTraceThrowable: Unexpected Location header in response for POST /_/tenant
2021-02-01T18:25:46,909 WARN  ProxyContext         Unexpected Location header in response for POST /_/tenant
io.vertx.core.impl.NoStackTraceThrowable: Unexpected Location header in respoor POST /_/tenant
2021-02-01T18:25:46,909 INFO  ProxyContext         139954/proxy RES 500 30679795us okapi Unexpected Location header in response for POST /_/tenant
2021-02-01T18:25:46,910 ERROR HttpResponse         HTTP response code=500 msg=Unexpected Location header in response for POST /_/tenant
2021-02-01T18:25:46,911 INFO  DockerModuleHandle   mod-permissions-5.13.0-SNAPSHOT.97 18:25:46 [] [] [] [] INFO  LogUtil              10.36.1.238:40314 POST /_/tenant null HTTP_1_1 201 345 797 tid=diku Created

Also weird:

ubuntu@ip-10-36-1-238:/var/log/folio/okapi$ grep "139954" okapi.log
Binary file okapi.log matches

Comment by Adam Dickmeiss [ 01/Feb/21 ]

See MODPERMS-128 Closed for a fix for the 2nd error.

Comment by John Malconian [ 01/Feb/21 ]

After applying the mod-permissions update, we are now looking at a new, unrelated error. This time from mod-source-record-storage:

22:57:10.283 [vert.x-worker-thread-0] WARN  ClientUtils          [3185eqId] Couldn't resolve server kafka:9092 from bootstrap.servers as DNS resolution failed for kafka
22:57:10.283 [vert.x-worker-thread-1] WARN  ClientUtils          [3185eqId] Couldn't resolve server kafka:9092 from bootstrap.servers as DNS resolution failed for kafka
22:57:10.289 [vert.x-eventloop-thread-0] ERROR RestVerticle         [3191eqId] During verticle deployment, init failed, exiting....... Failed to construct kafka consumer
io.vertx.core.impl.NoStackTraceThrowable: Failed to construct kafka consumer
22:57:10.290 [vert.x-eventloop-thread-1] ERROR ertxIsolatedDeployer [3192eqId] Failed in deploying verticle
io.vertx.core.impl.NoStackTraceThrowable: Failed to construct kafka consumer
io.vertx.core.impl.NoStackTraceThrowable: Failed to construct kafka consumer

This is a fatal error. The module container crashes.

Comment by Ian Hardy [ 01/Feb/21 ]

Looks like srs now needs to talk to kafka directly. I guess I'll try setting: https://github.com/folio-org/mod-source-record-storage/commit/ef7e23ff9433bf9af115e271a5a65f5705dd25e0#diff-00d3f6d8ea6dc6833c711e102774e1bc41ddc19f13beda8d773b38f6e147a74aR22

Kateryna Senchenko will you update mod-srs's module descriptor with any new required environment variables?

Comment by Ian Hardy [ 01/Feb/21 ]

Or, better yet, use "KAFKA_PORT" and "KAFKA_HOST" environment variables. These are the variable names that pubsub and mod-inventory-storage are using

Comment by Ian Hardy [ 01/Feb/21 ]

-I set FOLIO_KAKFA_HOST on srs to the correct host ("KAFKA_HOST=10.36.1.155" where the IP derived from ansible), but SRS still gets a fatal error:

Probably because the kafka host is not set properly. -

My mistake, I set the variable wrong, now that its set right, the snapshot-load build passed. I'd still like the SRS devs to consider using the environment variables KAKFA_HOST and KAFKA_PORT instead since pubsub and inventory-storage use those.

Comment by Kateryna Senchenko [ 02/Feb/21 ]

Hi everyone,

Data-import now goes directly via kafka. I created PRs to correct the required env names

https://github.com/folio-org/mod-inventory/pull/299
https://github.com/folio-org/mod-source-record-storage/pull/290
https://github.com/folio-org/mod-source-record-manager/pull/371
https://github.com/folio-org/mod-data-import/pull/158

We'll make sure to update the documentation before the release

Comment by Kateryna Senchenko [ 02/Feb/21 ]

Recent logs from folio-snapshot-load indicate that mod-inventory failed to create Kafka Consumers due to missing ENV variables. After Ian Hardy set FOLIO_KAFKA_HOST and _PORT variables - it resolved the issues for mod-data-import, mod-source-record-manager and mod-source-record-storage, but not for mod-inventory apparently. I created the PRs to correct the required variables names, but still not sure why mod-inventory didn't pick up the env variables like other modules

02 Feb 2021 02:19:30:247 WARN  ClientUtils [] Couldn't resolve server kafka:9092 from bootstrap.servers as DNS resolution failed for kafka
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
	at org.folio.inventory.Launcher.startConsumerVerticles(Launcher.java:84)
	at org.folio.inventory.Launcher.main(Launcher.java:52)
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:825)
	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631)
	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:612)
	at io.vertx.kafka.client.consumer.KafkaReadStream.create(KafkaReadStream.java:113)
	at io.vertx.kafka.client.consumer.KafkaConsumer.create(KafkaConsumer.java:75)
	at org.folio.kafka.KafkaConsumerWrapper.start(KafkaConsumerWrapper.java:109)
	at org.folio.inventory.DataImportConsumerVerticle.createKafkaConsumerWrapper(DataImportConsumerVerticle.java:119)
	at org.folio.inventory.DataImportConsumerVerticle.lambda$start$0(DataImportConsumerVerticle.java:85)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:720)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at org.folio.inventory.DataImportConsumerVerticle.start(DataImportConsumerVerticle.java:86)
	at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$9(DeploymentManager.java:556)
	at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
	at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:89)
	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:48)
	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:737)
	... 23 more
Comment by Marc Johnson [ 02/Feb/21 ]

Kateryna Senchenko

Recent logs from folio-snapshot-load indicate that mod-inventory failed to create Kafka Consumers due to missing ENV variables. ... I created the PRs to correct the required variables names, but still not sure why mod-inventory didn't pick up the env variables like other modules

Does this pull request need merging and the environments rebuilding in order for data import to work?

Comment by Kateryna Senchenko [ 02/Feb/21 ]

Hi Marc Johnson,

It was suggested by Ian Hardy to use KAFKA_HOST and KAFKA_PORT environment variables - the PRs contain that change. But I'm still trying to figure out why mod-inventory didn't pick up FOLIO_KAFKA_HOST and FOLIO_KAFKA_PORT after Ian set these variables as well.
We can try to merge the PRs and rebuild folio-snapshot-load though - to check whether this problem is still reproducing

Comment by Marc Johnson [ 02/Feb/21 ]

Kateryna Senchenko

We can try to merge the PRs and rebuild folio-snapshot-load though

I suggest we move forward with using the standard environment variables.

Ian Hardy I imagine the use of FOLIO_KAFKA_HOST and FOLIO_KAFKA_PORT was an attempt at a temporary fix.

Comment by Kateryna Senchenko [ 02/Feb/21 ]

OK, I'll merge the PRs then and we'll see how it goes

Comment by Ian Hardy [ 02/Feb/21 ]

Yes, that was temporary, Thanks Kateryna Senchenko for setting the standard vars. I've removed the FOLIO_KAKFA_* settings from our ansible playbook.

Comment by Aliaksei Harbuz [ 19/Jul/21 ]

The issue related system user initialization(missing token header) at snapshot env exists not only in mod-remote-storage module but also at mod-search module:

Method [public void org.folio.search.integration.KafkaMessageListener.handleEvents(java.util.List<org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, org.folio.search.domain.dto.ResourceEventBody>>)]
Bean [org.folio.search.integration.KafkaMessageListener@62ccbd95]; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void org.folio.search.integration.KafkaMessageListener.handleEvents(java.util.List<org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, org.folio.search.domain.dto.ResourceEventBody>>)' threw exception; nested exception is feign.FeignException$BadRequest: [400 Bad Request] during [POST] to [http://authn/login] [AuthnClient#getApiKey(UserCredentials)]: [Missing Okapi token header]; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void org.folio.search.integration.KafkaMessageListener.handleEvents(java.util.List<org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, org.folio.search.domain.dto.ResourceEventBody>>)' threw exception; nested exception is feign.FeignException$BadRequest: [400 Bad Request] during [POST] to [http://authn/login] [AuthnClient#getApiKey(UserCredentials)]: [Missing Okapi token header]

Comment by Wayne Schneider [ 21/Jul/21 ]

Aliaksei Harbuz It may have to do with the order in which modules are enabled. I note that in the module descriptor for mod-remote-storage, the authtoken interface is not required, so there is no guarantee that it will be present to provide a token when the module is initialized. See also MSEARCH-83 Closed .

Before you jump to the conclusion that you actually require this interface, please think hard about whether your module should be able to operate in an unlocked system (i.e., a system without the authtoken interface). Most modules use module permissions to assign appropriate permissions to endpoints that require them, rather than provisioning users and logging them in. Please review the Okapi security guide to get more information about how permissions are used in an Okapi system.

Comment by Aliaksei Harbuz [ 23/Jul/21 ]

Hi Wayne Schneider, the  module descriptor of mod-remote-storage was updated by adding authtoken interface v.2.0. But the issue with system user initialization still exists. It was found additional logs from okapi at same time when errors were logged by mod-remote-storage: 

01:52:12 [] [] [] [] INFO  DockerModuleHandle   mod-login-7.5.0-SNAPSHOT.98 01:52:12 [] [] [] [] INFO  RestRouting          invoking postAuthnLogin
01:52:12 [] [] [] [] INFO  DockerModuleHandle   mod-login-7.5.0-SNAPSHOT.98 01:52:12 [] [] [] [] ERROR LoginAPI             Missing request token
01:52:12 [] [] [] [] INFO  DockerModuleHandle   mod-login-7.5.0-SNAPSHOT.98 01:52:12 [] [] [] [] INFO  LogUtil              10.36.1.66:57604 POST /authn/login null HTTP_1_1 400 26 1 tid=diku Bad Request 

 Updates:

According Jenkins logs for snapshot:

ok: [10.36.1.66] => (item={'id': 'mod-remote-storage-1.3.0', 'action': 'enable'})

Interfaces updates may not apply. Changes will be double checked after versions fixes and rebuild.

Comment by Aliaksei Harbuz [ 26/Jul/21 ]

System user initialization's issue was fixed by interfaces updates : 

01:52:06 [] [] [] [] INFO  PubSubService        Module's subscribers were successfully registered
01:52:06 [] [] [] [] INFO  curityManagerService Saved credentials for user: [system-user]
01:52:50 [] [] [] [] INFO  KafkaService         Restarting kafka consumer to start listening created topics [id: mod-remote-storage-events-listener]
01
Comment by Siarhei Charniak [ 30/Jul/21 ]

Double check logs:

01:52:35 [] [] [] [] INFO  curityManagerService Saved credentials for user: [system-user]
01:52:35 [] [] [] [] INFO  PubSubService        Module's publishers were successfully registered
01:52:35 [] [] [] [] INFO  PubSubService        Module's subscribers were successfully registered
01:53:13 [] [] [] [] INFO  KafkaService         Restarting kafka consumer to start listening created topics [id: mod-remote-storage-events-listener]

System user initialization works as expected, the story can be closed as done.

Generated at Thu Feb 08 23:24:48 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.