[FOLIO-3073] Health check for mod-password-validator is not working Created: 10/Mar/21  Updated: 24/Mar/21  Resolved: 24/Mar/21

Status: Closed
Project: FOLIO
Components: None
Affects versions: None
Fix versions: None

Type: Task Priority: P2
Reporter: Dima Tkachenko Assignee: John Malconian
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original estimate: Not Specified

Attachments: PNG File image-2021-03-10-18-37-55-471.png    
Issue links:
Blocks
blocks MODPWD-59 Add admin healthcheck endpoint Closed
blocks MODQM-84 Add standard admin healthcheck endpoint Closed
Sprint: DevOps Sprint 110
Development Team: FOLIO DevOps

 Description   

Purpose/Overview:

New health check has been added to mod-password-validator and configured in Jenkins build. The url is standard: /admin/health. During the docker image build Jenkins job fails with:

+ docker run -d --health-timeout=2s --health-retries=2 '--health-cmd=curl -sS --fail -o /dev/null http://localhost:8081/admin/health || exit 1' --cidfile mod-password-validator:2.0.1-SNAPSHOT.4-4.cid mod-password-validator:2.0.1-SNAPSHOT.4
5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd
[Pipeline] readFile
[Pipeline] sh
+ docker inspect 5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd
+ jq -r '.[].State.Health.Status'
[Pipeline] echo
Current Status: unhealthy
[Pipeline] sh
++ cat mod-password-validator:2.0.1-SNAPSHOT.4-4.cid
+ docker stop 5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd
5cfd75639076747dc6bab45f74f46cc4cf1a8674387ba7133213a657746756dd

The health check endpoint has been verified locally:

Failing Jenkins job: https://jenkins-aws.indexdata.com/job/folio-org/job/mod-password-validator/job/MODPWD-59_Add_admin_healthcheck_endpoint/

 



 Comments   
Comment by Dima Tkachenko [ 12/Mar/21 ]

Jakub Skoczen

Could you prioritize this high in the backlog and take into the next sprint please?

There are a couple of USs on our side that are blocked by the issue

cc: Khalilah Gambrell

Comment by Khalilah Gambrell [ 17/Mar/21 ]

Jakub Skoczen, any chance someone on the Devops team can take this on in this sprint or the next sprint. We are blocked by this story and we like to support this for the Iris release.

cc: Oleksii Petrenko and Sobha Duvvuri

Comment by John Malconian [ 22/Mar/21 ]

I've emulated the docker health check that Jenkins performs locally and it does fail as designed. The container exits with an error almost immediately after launching it:

docker run -d --health-timeout=3s --health-retries=3 '--health-cmd=curl -sS --fail -o /dev/null http://localhost:8081/admin/health || exit 1' --cidfile mod-password-validator:2.0.1-SNAPSHOT.6-6.cid mod-password-validator:2.0.1-SNAPSHOT.6

./run-java.sh: /usr/verticles/run-env.sh: line 7: DB_HOST: parameter not set

When the docker health check command is executed in CI, it does a simple check to see if the container has started and is running. That's really it. It doesn't evaluate the overall health of the application itself. There are no external databases to connect to in the CI pipeline so DB_HOST wouldn't be set.

There are a couple of things I question here:

1. Should the container actually exit if an environment variable is not set? Or should it run and just log that DB_HOST is not set. Is this the way other storage modules are implemented?

2. If there is a suitable command line argument that can be passed to the container at runtime that will ignore any external database configuration, you can set those command line arguments when running the health check by adding 'runArgs' to the buildDocker parameters in the Jenkinsfile:

 buildDocker {
      publishMaster = 'yes'
      healthChk = 'yes'
      healthChkCmd = 'curl -sS --fail -o /dev/null http://localhost:8081/admin/health || exit 1'
      runArgs = '-DSomeSuitableCmdLineArguments'
    }

If this cannot be resolved with 'runArgs' and you've confirmed that the module's behavior should be to exit with an error if DB_HOST is not set, then the most expedient resolution is to disable this check in the Jenkinsfile.

Possibly compounding this issue is the question, even if the container runs without DB_HOST or a database, will the /admin/health endpoint still return a 2xx status code? Is this endpoint designed to actually check the overall health of the application or just to check if it is running?

Comment by Dima Tkachenko [ 23/Mar/21 ]

John Malconian

Thank you for providing the route cause of the issue. It wasn't obvious from Jenkins builds.

I've to check a couple of things before giving some feedback

Comment by Dima Tkachenko [ 24/Mar/21 ]

update: I've made some changes in the application settings and now the module starts even when DB is not available. Health checks has been tuned also and now it works for the aforementioned case

Closing the ticket

Comment by John Malconian [ 24/Mar/21 ]

Dima Tkachenko Excellent news! Thanks.

Generated at Thu Feb 08 23:25:24 UTC 2024 using Jira 1001.0.0-SNAPSHOT#100246-sha1:7a5c50119eb0633d306e14180817ddef5e80c75d.