Job executor not picking up job

I am running Camunda-bpm-tomcat-7.10.0 on the standard docker distribution. I have a process which is triggered by Rest endpoints start process instance by key.
Now I am trying to delete all the process Instance having a specific business key Asynchronously.
Reference : - Delete Async (POST) | docs.camunda.org
This rest call is giving me 200 response
{
“id”: “07cea81f-721e-11e9-b695-964c4b946d65”,
“type”: “instance-deletion”,
“totalJobs”: 37694,
“jobsCreated”: 0,
“batchJobsPerSeed”: 100,
“invocationsPerBatchJob”: 1,
“seedJobDefinitionId”: “07cea820-721e-11e9-b695-964c4b946d65”,
“monitorJobDefinitionId”: “07cea821-721e-11e9-b695-964c4b946d65”,
“batchJobDefinitionId”: “07cea822-721e-11e9-b695-964c4b946d65”,
“suspended”: false,
“tenantId”: null,
“createUserId”: null
}
But this batch job never gets picked up by job executor.
JobExecutor is running as per the logs. But there is nothing in the logs related to this batch job.
Cna anyone please help me, what I am missing here.
Delete is working fine when I am trying to delete single process instance by id…

1 Like

@akkujain93,

It appears that your query is valid, as it matches nearly 38K jobs. (That’s a good number of jobs!) :slight_smile:

Have you checked the status of the batch using this endpoint: https://docs.camunda.org/manual/7.10/reference/rest/batch/get/?

-Ryan

{
“id”: “07cea81f-721e-11e9-b695-964c4b946d65”,
“type”: “instance-deletion”,
“totalJobs”: 37694,
“jobsCreated”: 37694,
“batchJobsPerSeed”: 100,
“invocationsPerBatchJob”: 1,
“seedJobDefinitionId”: “07cea820-721e-11e9-b695-964c4b946d65”,
“monitorJobDefinitionId”: “07cea821-721e-11e9-b695-964c4b946d65”,
“batchJobDefinitionId”: “07cea822-721e-11e9-b695-964c4b946d65”,
“suspended”: false,
“tenantId”: null,
“createUserId”: null
}

This is the getter API response. No instance deleted yet.
Is this because my all instances are waiting for external api to response?
I am calling extenal service from the process.

@akkujain93,

When a job is created, an entry for that job is placed in the ACT_RU_JOB table. If you look at those records, you’ll see some useful information, including LOCK_EXP_TIME_ and LOCK_OWNER_. If those are populated, then the job is running. If not, then the job is not running. Moreover, there’s a field in there named EXCEPTION_MSG_, which will tell you if the job ran and failed (because an exception message would appear if that occurred). It would appear that perhaps your Job Executor either isn’t running or that some other problem exists, because it should be successfully deleting those process instances.

Note that tests on my side confirm that the “Delete Async (POST)” endpoint that you referenced will delete running instances, i.e. instances for which there is an associated, running thread. This is at least true if the running thread was started by the Job Executor.

-Ryan

These fields are empty for my job.
Can you please let me know how these fields get populated.
There is no exception in Exception_msg_ as well.
I am just trying the POST request only.

@akkujain93,

When the Job Executor picks up a job, it locks it to avoid having any other Job Executor threads pick up that same job. If the job runs to completion, it is then removed from the ACT_RU_JOB table. If it fails, an exception message will appear. Thus, if there is no lock and no exception message after some reasonable period of time, that means that the job has never been attempted. If all of your jobs show no locks and no exception messages, that means that the Job Executor is not attempting to pick them up. That - in turn - almost certainly means that the Job Executor is either not running or that you have some errant configuration in place.

Thus, more investigation is required. My recommendation would be to do some more investigation into the status and configuration of your Job Executor.

-Ryan

Can I try to enable some logging which can provide me more knowledge about this…
If yes How can?

@akkujain93,

Information regarding job acquisition, jobs executed successfully and job failures is automatically captured in the ACT_RU_METER_LOG table. (By default, these metrics are gathered every 15 minutes.) For more information, you can navigate to the “Metrics” page under “User Guide”/“Process Engine” within Camunda’s online docs.

You can also retrieve information about the Job Executor at runtime by executing the following code, for example within a JavaDelegate:

ProcessEngine defaultEngine = ProcessEngines.getDefaultProcessEngine();
JobExecutor jobExecutor = ((ProcessEngineConfigurationImpl)defaultEngine.getProcessEngineConfiguration())
                .getJobExecutor();

log.info("Job executor active? " + jobExecutor.isActive());
log.info("Wait time: " + jobExecutor.getWaitTimeInMillis());
log.info("Max jobs per acquisition: " + jobExecutor.getMaxJobsPerAcquisition());

(In the example above, “log” is an implementation of java.util.logging.Logger.)

Good luck!

-Ryan

1 Like

camundaTaskExecutor is working as per the logs.
2019-05-23 23:26:39.327 INFO 54936 — [ main] org.camunda.bpm.spring.boot : STARTER-SB040 Setting up jobExecutor with corePoolSize=3, maxPoolSize:10
2019-05-23 23:26:39.329 INFO 54936 — [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService ‘camundaTaskExecutor’.

As my History configuration is also working which is by job excutor only.
But for async batch delete only, it’s not working…
Let me try with one more process with delegate and all.
I will post my findings.

Hi @akkujain93,

maybe this post helps you further: Service orchestration with subprocesses / How to trigger job acquisition explicitly

Cheers, Ingo

1 Like

Trying this.

This was happening because I am having a heterogeneous cluster, which is having configuration as deployment aware.
So If I restart the node, all the in memory data has been gone.
So jobExecutor never fetch the job because of deployment_id persisted during the async call doesn’t match to the in memory data.
It is working fine in homogenous cluster setup or in case of deployment aware false.
For heterogeneous setup, I have to register all the deployments again while server startup.
For that I raised a question -