Timers not firing after server restart

gcalvo · September 19, 2017, 12:12pm

Hello, the other day I found many timers that didn’t fire when at their duedate (september 10th). Yesterday I tested those same processes just changing the timer’s value. I tried with 1 minute, 1 hour and 5 hours and all of them worked perfectly.

So I though maybe the server restarting is messing all up. So today I launched the processes with 1 hour timer again and restarted the server (tomcat7) afterwards. One hour after I see all the processes waiting for the timer and all the jobs in the database with their duedates in the past. Then I relaunched the processes with 1 minute timers expecting them to work because I’m not restarting the server again, but they get stuck as well.

I have no clue what may be happening. Any help is appreciated.

My configuration of the jobExecutor in bpm-platform.xml:

  <job-executor>
    <job-acquisition name="default" />
  </job-executor>
  <process-engine name="default">
    <job-acquisition>default</job-acquisition>
    <configuration>org.camunda.bpm.engine.impl.cfg.StandaloneProcessEngineConfiguration</configuration>
    <datasource>java:jdbc/ProcessEngine</datasource>
    <properties>
      <property name="history">full</property>
      <property name="databaseSchemaUpdate">true</property>
      <property name="authorizationEnabled">true</property>
      <property name="jobExecutorDeploymentAware">true</property>
      <property name="jobExecutorActivate">true</property>
    </properties>

My data source connection (postgresql):

<Resource name="jdbc/ProcessEngine"
          auth="Container"
          type="javax.sql.DataSource"
          factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
          uniqueResourceName="process-engine"
          driverClassName="org.postgresql.Driver"
          url="jdbc:postgresql://****/camunda"
          username="camunda"
          password="****"
          maxActive="10"
          minIdle="5"
          validationQuery="select 1"  removeAbandoned="true" removeAbandonedTimeout="120"/>

A json example of one of the jobs unexecuted:

{
“id”: “0fef5782-9d1f-11e7-a457-02000af94315”,
“jobDefinitionId”: “da105c07-9c88-11e7-99dc-02000af94315”,
“processInstanceId”: “0b014924-9d1f-11e7-a457-02000af94315”,
“processDefinitionId”: “Process_1060:1:da105c06-9c88-11e7-99dc-02000af94315”,
“processDefinitionKey”: “Process_1060”,
“executionId”: “0feac36c-9d1f-11e7-a457-02000af94315”,
“exceptionMessage”: null,
“retries”: 3,
“dueDate”: “2017-09-19T12:43:59”,
“suspended”: false,
“priority”: 0,
“tenantId”: “****”
}

Webcyberrob · September 19, 2017, 10:04pm

Hi,

after the timers fire, are you routing to a service task which is using a connector to make a remote procedure call? If so, check that the remote end points are not blocking etc.

regards

Rob

gcalvo · September 20, 2017, 11:01am

Hello, the rest of the process is fine, it connects to a web service and send some emails, but it works correctly. In many of my tests has worked without problem. The job is not executing, I just checked the logs of today. 21 processes should have fire their timers around 4:30 AM. The log showed the server restart at 1 AM and the next trace is from 9 AM. Both the engine and the endpoint it connects after the timer are in the same server and both of them catch all exceptions and log them. I also include execution listeners at the start and end of every element and log those executions to check the process trace when something goes wrong.

The jobs are still there, untouched:

{
    "id": "5b2f0418-9d58-11e7-8913-02000af94315",
    "jobDefinitionId": "2867232a-9d58-11e7-8913-02000af94315",
    "processInstanceId": "5b00c802-9d58-11e7-8913-02000af94315",
    "processDefinitionId": "Process_1063:1:28672329-9d58-11e7-8913-02000af94315",
    "processDefinitionKey": "Process_1063",
    "executionId": "5b2eb5f4-9d58-11e7-8913-02000af94315",
    "exceptionMessage": null,
    "retries": 3,
    "dueDate": "2017-09-20T04:34:07",
    "suspended": false,
    "priority": 0,
    "tenantId": "****"
}

Regards,
Gonzalo

thorben · September 20, 2017, 11:08am

Hi Gonzalo,

How do you deploy your processes? Via Java API or REST API? Do you use process applications or not?

Cheers,
Thorben

gcalvo · September 20, 2017, 11:17am

Hi Thorben, I deploy using REST API and I don’t use process applications. It is a shared server.

Regards,
Gonzalo

thorben · September 20, 2017, 11:22am

Then you should disable the flag jobExecutorDeploymentAware in the platform configuraiton. This flag makes sense when processes require external resources such as classes that are provided by process applications. Then the flag and registrations managed by ManagementService avoid job execution when those resources are not available. See Service Task with Async Continuation Never Executes for a detailed discussion of this.

StephenOTT · September 20, 2017, 1:10pm

@thorben what do you think about adding some text to the docker container Readme? / The general install instructions about this?

thorben · September 20, 2017, 2:09pm

@StephenOTT, I agree that this should be documented better, but I’m not sure where.

StephenOTT · September 20, 2017, 2:26pm

I would think that:

https://docs.camunda.org/manual/7.7/installation/

and

github.com

camunda/docker-camunda-bpm-platform/blob/master/README.md

# Camunda BPM Platform Docker Images

This Camunda BPM community project provides docker images of the latest Camunda
BPM platform releases. The images can be used to demonstrate and test the
Camunda BPM platform or can be extended with own process applications. It is
planned to provide images on the official [docker registry][] for every upcoming
release, which includes alpha releases.

## Status [![Status][status]][travis]

| Version      | Tomcat                                                     | JBoss                                                    | WildFly                                                      |
| ------------ | ---------------------------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ |
| SNAPSHOT     | [![Tomcat SNAPSHOT][status-tomcat-snapshot]][travis]       | [![JBoss SNAPSHOT][status-jboss-snapshot]][travis]       | [![WildFly SNAPSHOT][status-wildfly-snapshot]][travis]       |
| 7.9.0-alpha3 | [![Tomcat 7.9.0][status-tomcat-790-alpha3]][travis]        | [![JBoss 7.9.0][status-jboss-790-alpha3]][travis]        | [![WildFly 7.9.0][status-wildfly-790-alpha3]][travis]        |
| 7.9.0-alpha2 | [![Tomcat 7.9.0][status-tomcat-790-alpha2]][travis]        | [![JBoss 7.9.0][status-jboss-790-alpha2]][travis]        | [![WildFly 7.9.0][status-wildfly-790-alpha2]][travis]        |
| 7.9.0-alpha1 | [![Tomcat 7.9.0][status-tomcat-790-alpha1]][travis]        | [![JBoss 7.9.0][status-jboss-790-alpha1]][travis]        | [![WildFly 7.9.0][status-wildfly-790-alpha1]][travis]        |
| 7.8.0        | [![Tomcat 7.8.0][status-tomcat-780]][travis]               | [![JBoss 7.8.0][status-jboss-780]][travis]               | [![WildFly 7.8.0][status-wildfly-780]][travis]               |
| 7.7.0        | [![Tomcat 7.7.0][status-tomcat-770]][travis]               | [![JBoss 7.7.0][status-jboss-770]][travis]               | [![WildFly 7.7.0][status-wildfly-770]][travis]               |
| 7.6.0        | [![Tomcat 7.6.0][status-tomcat-760]][travis]               | [![JBoss 7.6.0][status-jboss-760]][travis]               | [![WildFly 7.6.0][status-wildfly-760]][travis]               |
| 7.5.0        | [![Tomcat 7.5.0][status-tomcat-750]][travis]               | [![JBoss 7.5.0][status-jboss-750]][travis]               | [![WildFly 7.5.0][status-wildfly-750]][travis]               |

This file has been truncated. show original

are the places to do it.

For the installation pages, a header and small chunk of text would be placed in each of the relevant pages related to shared process engine deployments.
Would likely be good to come up with a keyword focused description of the issue that can be better picked up by a google search.

gcalvo · September 20, 2017, 2:36pm

I changed the developmentAware value to false and restarted the server. At that point about 200 processes that were stuck (processes I launched for testing) went forward, and another 500 got rescheduled for either tonight or a week from now. I guess it is like a restart of the timers? I got some timers with a value of 5 hours and others with a value of 1 week.
The retries count is still at 3 for every job.

Besides the funny behaviour I think it is working properly now. Thanks guys! Really appreciate the help.

Yours faithfully,
Gonzalo

system · January 30, 2024, 10:59am