Method used for health check by loadbalancer

Hi
We are moving our project into a new loadbalanced test environment and the “loadbalancer guys” are asking for a camunda “health check” to call to see if a node is up and running. I could not find a specified helth check in the API so I’m thinking of using the http://camunda.server.url:port/engine-rest/engine
If camunda responds with a 200 does this also mean the camunda database is up and running and Camunda is operational? Does anybody have an other better solution?
Thanks
/Martin

Maybe just asking for the engine would do the trick

<server>/engine-rest/engine

https://docs.camunda.org/manual/7.9/reference/rest/engine/get-names/

Hi Niall
Yes, that was my original thought. Do you know if this will automatically mean that the database also is available (is queried with this call) or does it get it from config read on startup?
BR
/Martin

If this doesn’t come back with a 200 you can be sure that either the node is down or the database is down.

Excellent.Thanks for the clarification.
/Martin

@JWorks So it turns out that there is a much better solutions (better in the sense that the one i give you doesn’t actually work as i expected)

I’m going to suggest 2 other calls that will always hit the DB

  1. Get the incident count
  • This would be good because it would return any incidents that exist.
  1. Case count
  • this would be good (if you don’t use CMMN) because it would be hitting an empty table and so would be a very cheap call.
1 Like

Ok, thanks. We will use the case incident count instead then, as we do not use CMMN.
/Martin

1 Like

I hate to bump an old thread but I need a suggestion or a feature request.

My team has Camunda in AWS with a load balancer in front. I was using the /engine-rest/engine resource for health checking that Camunda is up & running, but per the discussion above, that does not signify that Camunda has connectivity to the database. I can’t use any of the count calls as we plan on using BPMN, CMMN, and DMN and I don’t want to have an expensive query run.

I thought I could use /engine-rest/{USER}/profile but that means that net-new instances would never pass health check because no users would exist. (without adhoc SQL inserts).

Are there any other resources that are very light on db query but can be used as a health check? If not, can a feature request be created to add said resource?

Much appreciated,
Cody

Hi @codygulley,

the request Niall has proposed is enough to check if the engine is running. If you get a different response than 200, then the system has a problem, independent of processes, decisions or cases.

Hope this helps, Ingo

HI @Ingo_Richtsmeier ,

Thanks for the reply. It is my understanding, based on the initial conversation, that the /engine does not query the database and that /case-instance/count should only be used if we are not using CMMN (which we are using). My main concern is unnecessary database load by using these calls calls for a health check every minute. These two points are the same two used by @Niall above; however, there is no alternative for someone who is using CMMN.

Seems like there should be a specific service built that would be used for health checking, or a different service that night not be as intensive on the database? What would be your suggestion?

Thanks!

Hi @codygulley,

what do you want to check for the healthcheck?

GET /engine-rest/incident/count will tell you that

  • the engine runs
  • can query the database
  • response with a reasonable number if you run processes
  • runs an indexed query in the database with low runtime and resource consumption (it won’t hurt anything)

Do you have any other special needs?

Cheers, Ingo

Hi @Ingo_Richtsmeier ,
Sorry for such a late response to this. It’s been quite the year.

I’ve been using the GET /engine-rest/engine resource as the load balancer we use (AWS Classic ELB) does not support any authentication and it is required by my company to have REST authentication in place. REST authentication is required by my company.

Ideally we need the following in a health check service:

  • A GET request
  • No authentication
  • Returns a 200 response
  • Checks database connectivity

Would it be possible to have a feature request in the backlog for such a service. It seems to be that we wouldn’t be the only enterprise wanting this.

Much appreciated,
Cody

Hi @codygulley,

yes, you can add any feature requests to the Camunda JIRA: https://jira.camunda.com/projects/CAM/issues/CAM-11237?filter=addedrecently.

If you provide an implementation as a pull request via https://github.com/camunda/camunda-bpm-platform, the chance that it will be adopted by the product team increases significantly.

If you are an enterprise customer, you can use the enterprise support: https://docs.camunda.org/enterprise/support/ to discuss your issue.

Hope the helps, Ingo

If you are using SpringBoot version of Camunda, you could use Spring Boot actuator, which would run a simple query - “Select 1” on the database and also it doesn’t require authentication.

Request: GET http://localhost:8080/actuator/health
Sample Response:
{
“status”: “UP”,
“components”: {
“db”: {
“status”: “UP”,
“details”: {
“database”: “Microsoft SQL Server”,
“result”: 1,
“validationQuery”: “SELECT 1”
}
},
“diskSpace”: {
“status”: “UP”,
“details”: {
“total”: 499963174912,
“free”: 208040579072,
“threshold”: 10485760
}
},
“jobExecutor”: {
“status”: “UP”,
“details”: {
“jobExecutor”: {
“name”: “JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]”,
“lockOwner”: “c1fcb20e-135d-47e2-9b26-f3a7e55ebaec”,
“lockTimeInMillis”: 300000,
“maxJobsPerAcquisition”: 3,
“waitTimeInMillis”: 5000,
“processEngineNames”: [
“default”
]
}
}
},
“ping”: {
“status”: “UP”
},
“processEngine”: {
“status”: “UP”,
“details”: {
“name”: “default”
}
}
}
}

Let me know if that works or if you have a different way of getting this done. Thanks!

@gopikrishnan, what is the yaml configuration for the actuator api call response?

You would need to add spring-boot-starter-actuator to your project dependency.
And in the yaml, you just need to include:

management.endpoint.health.show-details: always

@gopikrishnan thanks for the details. Would you mind sharing the complete yaml configuration for reference?