[FOLIO-3049] Intermittent mod-source-record-storage/mod-pubsub initialization failure Created: 01/Mar/21 Updated: 23/Apr/21 Resolved: 20/Apr/21 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Bug | Priority: | P2 |
| Reporter: | Wayne Schneider | Assignee: | David Crossley |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue links: |
|
||||||||
| Sprint: | DevOps Sprint 109, DevOps Sprint 110, DevOps Sprint 111, DevOps Sprint 112 | ||||||||
| Development Team: | FOLIO DevOps | ||||||||
| Description |
|
In our Jenkins reference builds, we are getting intermittent failures initializing mod-source-record-storage. The build fails with the error:
We are seeing this error condition both in AWS reference builds and in Vagrant box builds. Unfortunately, the error does not occur consistently. Oleksii Kuzminov suggests:
FWIW, in the reference builds, mod-pubsub is deployed first (because it is first in the dependency graph). Perhaps it is not always ready to process requests by the time mod-source-record-storage attempts to register. |
| Comments |
| Comment by David Crossley [ 18/Mar/21 ] |
|
This intermittent behaviour has been has been happening most days on the reference environment daily builds, for just one of the set of envs. Mostly for folio-snapshot-core, but sometimes folio-snapshot-load, and once on folio-testing-core-backend. |
| Comment by David Crossley [ 18/Mar/21 ] |
|
Oh, why did i utter those words. Murphy hit me. This problem happened on folio-snapshot-core during today's normal daily builds. As usual, a re-run was okay. Then later today i was trying to finalise some testing on an unrelated ticket, and it hit me three times in a row on folio-snapshot-test. These with mod-patron-blocks. A following run on folio-testing-test was okay. Then another run on folio-snapshot-test failed (this time with mod-source-record-storage). Give up. |
| Comment by Jakub Skoczen [ 18/Mar/21 ] |
|
Oleksii Kuzminov Since these modules depend on mod-pubsub Okapi will ensure that mod-pubsub is initialized first. Is it possible that mod-pubsub returns from the 'init' call too early, before it is properly initizialized? |
| Comment by Wayne Schneider [ 19/Mar/21 ] |
|
Bumping this to P2 as this is now blocking other work (aside from just being very very frustrating). |
| Comment by David Crossley [ 19/Mar/21 ] |
|
Today the daily reference environment builds with both folio-snapshot-core and folio-snapshot-load (mod-patron-blocks) failed, and re-run was okay for each. |
| Comment by Wayne Schneider [ 23/Mar/21 ] |
|
Unable to reproduce the problem today. There have been some changes since last week in mod-pubsub and presumably other affected modules, as well. Putting this in review (though there is no code to review); I'd be very surprised if the problem just went away, though. |
| Comment by David Crossley [ 24/Mar/21 ] |
|
Ha, Murphy strikes again. I have not noticed any on the reference environments for the past four days. However, there was an out-of-band rebuild earlier today of folio-snapshot-load/852 which failed (this time with mod-patron-blocks). |
| Comment by Wayne Schneider [ 24/Mar/21 ] |
|
Thanks, David Crossley, now we are getting somewhere. This appears to be the relevant part of the pubsub log from when mod-patron-blocks is initializing:
17:53:18 [] [] [] [] INFO ingModuleServiceImpl Event type ITEM_AGED_TO_LOST does not exist, creating a temporary definition
17:53:18 [] [] [] [] INFO ingModuleServiceImpl Event type ITEM_CLAIMED_RETURNED does not exist, creating a temporary definition
17:53:18 [] [] [] [] ERROR DbUtil Rollback transaction
io.vertx.pgclient.PgException: { "message": "insert or update on table \"messaging_module\" violates foreign key constraint \"fk_event_type_id\"", "severity": "ERROR", "code": "23503", "detail": "Key (event_type_id)=(ITEM_CLAIMED_RETURNED) is not present in table \"event_descriptor\".", "file": "ri_triggers.c", "line": "3255", "routine": "ri_ReportViolation", "schema": "pubsub_config", "table": "messaging_module", "constraint": "fk_event_type_id" }
at io.vertx.pgclient.impl.codec.ErrorResponse.toException(ErrorResponse.java:31) ~[mod-pubsub-server-fat.jar:?]
at io.vertx.pgclient.impl.codec.QueryCommandBaseCodec.handleErrorResponse(QueryCommandBaseCodec.java:57) ~[mod-pubsub-server-fat.jar:?]
at io.vertx.pgclient.impl.codec.ExtendedQueryCommandCodec.handleErrorResponse(ExtendedQueryCommandCodec.java:90) ~[mod-pubsub-server-fat.jar:?]
at io.vertx.pgclient.impl.codec.PgDecoder.decodeError(PgDecoder.java:236) ~[mod-pubsub-server-fat.jar:?]
at io.vertx.pgclient.impl.codec.PgDecoder.decodeMessage(PgDecoder.java:122) [mod-pubsub-server-fat.jar:?]
at io.vertx.pgclient.impl.codec.PgDecoder.channelRead(PgDecoder.java:102) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) [mod-pubsub-server-fat.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [mod-pubsub-server-fat.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [mod-pubsub-server-fat.jar:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [mod-pubsub-server-fat.jar:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [mod-pubsub-server-fat.jar:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
17:53:18 [] [] [] [] ERROR ExceptionHelper { "message": "insert or update on table \"messaging_module\" violates foreign key constraint \"fk_event_type_id\"", "severity": "ERROR", "code": "23503", "detail": "Key (event_type_id)=(ITEM_CLAIMED_RETURNED) is not present in table \"event_descriptor\".", "file": "ri_triggers.c", "line": "3255", "routine": "ri_ReportViolation", "schema": "pubsub_config", "table": "messaging_module", "constraint": "fk_event_type_id" }
17:53:18 [] [] [] [] INFO LogUtil org.folio.rest.RestVerticle start invoking postPubsubEventTypesDeclareSubscriber
17:53:18 [] [] [] [] INFO LogUtil 10.36.1.167:60996 POST /pubsub/event-types/declare/subscriber HTTP_1_1 422 284 20 tid=diku Unprocessable Entity
|
| Comment by Wayne Schneider [ 24/Mar/21 ] |
|
Blocking on
|
| Comment by David Crossley [ 15/Apr/21 ] |
|
I am monitoring daily builds to ensure fixed. |
| Comment by David Crossley [ 19/Apr/21 ] |
|
Not seen any occurrence yet. Will keep monitoring. |
| Comment by David Crossley [ 23/Apr/21 ] |
|
Not seen this behaviour since the previous comment. |