[FOLIO-3073] Health check for mod-password-validator is not working Created: 10/Mar/21 Updated: 24/Mar/21 Resolved: 24/Mar/21 |
|
| Status: | Closed |
| Project: | FOLIO |
| Components: | None |
| Affects versions: | None |
| Fix versions: | None |
| Type: | Task | Priority: | P2 |
| Reporter: | Dima Tkachenko | Assignee: | John Malconian |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue links: |
|
||||||||||||
| Sprint: | DevOps Sprint 110 | ||||||||||||
| Development Team: | FOLIO DevOps | ||||||||||||
| Description |
|
Purpose/Overview: New health check has been added to mod-password-validator and configured in Jenkins build. The url is standard: /admin/health. During the docker image build Jenkins job fails with: + docker run -d --health-timeout=2s --health-retries=2 '--health-cmd=curl -sS --fail -o /dev/null http://localhost:8081/admin/health || exit 1' --cidfile mod-password-validator:2.0.1-SNAPSHOT.4-4.cid mod-password-validator:2.0.1-SNAPSHOT.4 5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd [Pipeline] readFile [Pipeline] sh + docker inspect 5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd + jq -r '.[].State.Health.Status' [Pipeline] echo Current Status: unhealthy [Pipeline] sh ++ cat mod-password-validator:2.0.1-SNAPSHOT.4-4.cid + docker stop 5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd 5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd The health check endpoint has been verified locally: Failing Jenkins job: https://jenkins-aws.indexdata.com/job/folio-org/job/mod-password-validator/job/MODPWD-59_Add_admin_healthcheck_endpoint/
|
| Comments |
| Comment by Dima Tkachenko [ 12/Mar/21 ] |
|
Could you prioritize this high in the backlog and take into the next sprint please? There are a couple of USs on our side that are blocked by the issue |
| Comment by Khalilah Gambrell [ 17/Mar/21 ] |
|
Jakub Skoczen, any chance someone on the Devops team can take this on in this sprint or the next sprint. We are blocked by this story and we like to support this for the Iris release. cc: Oleksii Petrenko and Sobha Duvvuri |
| Comment by John Malconian [ 22/Mar/21 ] |
|
I've emulated the docker health check that Jenkins performs locally and it does fail as designed. The container exits with an error almost immediately after launching it: docker run -d --health-timeout=3s --health-retries=3 '--health-cmd=curl -sS --fail -o /dev/null http://localhost:8081/admin/health || exit 1' --cidfile mod-password-validator:2.0.1-SNAPSHOT.6-6.cid mod-password-validator:2.0.1-SNAPSHOT.6 ./run-java.sh: /usr/verticles/run-env.sh: line 7: DB_HOST: parameter not set When the docker health check command is executed in CI, it does a simple check to see if the container has started and is running. That's really it. It doesn't evaluate the overall health of the application itself. There are no external databases to connect to in the CI pipeline so DB_HOST wouldn't be set. There are a couple of things I question here: 1. Should the container actually exit if an environment variable is not set? Or should it run and just log that DB_HOST is not set. Is this the way other storage modules are implemented? 2. If there is a suitable command line argument that can be passed to the container at runtime that will ignore any external database configuration, you can set those command line arguments when running the health check by adding 'runArgs' to the buildDocker parameters in the Jenkinsfile:
buildDocker {
publishMaster = 'yes'
healthChk = 'yes'
healthChkCmd = 'curl -sS --fail -o /dev/null http://localhost:8081/admin/health || exit 1'
runArgs = '-DSomeSuitableCmdLineArguments'
}
If this cannot be resolved with 'runArgs' and you've confirmed that the module's behavior should be to exit with an error if DB_HOST is not set, then the most expedient resolution is to disable this check in the Jenkinsfile. Possibly compounding this issue is the question, even if the container runs without DB_HOST or a database, will the /admin/health endpoint still return a 2xx status code? Is this endpoint designed to actually check the overall health of the application or just to check if it is running? |
| Comment by Dima Tkachenko [ 23/Mar/21 ] |
|
Thank you for providing the route cause of the issue. It wasn't obvious from Jenkins builds. I've to check a couple of things before giving some feedback |
| Comment by Dima Tkachenko [ 24/Mar/21 ] |
|
update: I've made some changes in the application settings and now the module starts even when DB is not available. Health checks has been tuned also and now it works for the aforementioned case Closing the ticket |
| Comment by John Malconian [ 24/Mar/21 ] |
|
Dima Tkachenko Excellent news! Thanks. |