Timer Boundary Events of active process instances are suspended when process def is suspended?

I am seeing a behavior that i cannot seem to find in the docs:

Given a active process instance with a task that has a non-blocking timer, when the process def is suspended, the process instance is still active, but the boundary timer gets suspended.

Is this expected? If so, is there docs explaining this?

Doesn’t suspending a process definition suspend all its instances?

@fml2 as per the API as ref: https://docs.camunda.org/manual/7.12/reference/rest/process-definition/put-activate-suspend-by-key/

You can suspend a Def without the active instances being suspended.

@Niall do you have insight on this?

I am thinking it is related to this https://jira.camunda.com/browse/CAM-3986 ?

But does not appear to have any docs related to this behavior https://github.com/camunda/camunda-docs-manual/blob/master/content/user-guide/process-engine/process-engine-concepts.md#suspend-process-definitions

@Niall if this behavior is expected then considering:

"If you have deployed definition, and you want to stop new process instances from being created for that definition, and you want to let the current active instances complete, how does one go about making this happen? "

If it’s not clearly documented, I’d try to figure it out from the code. There lies the truth!

It is not about the code in this case, it is about the underlying impl decisions and how to use within ops.

Hi Folks,
I’ll look into this and see if i can get an answer for you.

-Niall

@Niall any update on this? Was this some direction on how to deal with the above scenario?

I’ve asked some of the folks internally to look into it.
We’ll see what they come up with.

@Niall any updates on this?

To Recap:

  1. When suspending a process definition (PD) you disable the ability to create new process instances(PI) for that PD.
  2. When you suspend a PD, the definitions of Timers and Jobs within the PD are also disabled
  3. When you suspend a PD, the active PI for that PD do not get suspended (optionally). So the PIs continue to execute.
  4. The PIs that are continuing to execute may have one or more Timers or Jobs. Therefore how can the Timers and Jobs continue to execute if the PD has disabled any further execution of these Timers and Job definitions?
  5. When a PD is suspended, and suspends all PIs related to that PD, and then you un-suspend a PI, the Timers and Jobs definitions would continue to be suspended, and therefore the proper execution of the PI appears to be impossible.

@Niall

Hi @StephenOTT,

please have a look at the test class mentioned in this post: Suspending process definition suspends job definitions too.

If all your combinations are asserted by a test, then everything is clear. If anything is open and not covered by a test, then you can try to add a test setup with assertions for this and have a look at the result.

Hope this helps, Ingo

@Ingo_Richtsmeier I am not sure I understand the context of your texts/assertions comments. Are you saying my logic above is the expected behaviour of the engine? If so, what is the direction on the ability to “disable a process def so that users cannot start it, but allow the existing instances to be completed” ?

Hi @StephenOTT,

no, I haven’t checked all tests, as the test class is about 2500 lines. So, a lot of tests are already built and you can assume that they define the behavior of the engine.

My thought was that it may be a good starting point to check your combinations if they are already covered by the tests. And try the missing combinations by adding the tests for them. If they run well, you can create a pull request and if the product team accepts them, they will be fixed (maybe forever…).

Hope this helps, Ingo

@Ingo_Richtsmeier: created a unit test to demo the bug / problem / missing consideration from what we can tell from the code…

Consider a process such as:
timer1

def "If job has been created and process definition is suspended then it should not suspend the current job"() {

    when:"Start the process 'process_timer'"
    def processInstance = runtimeService().startProcessInstanceByKey("process_timer")
    def processInstanceId = processInstance.getProcessInstanceId()

    and:"Complete task UT1"
    complete(task(processInstance))

    then: "Job TIMER_JOB is not suspended"
    JobQuery jobQuery = managementService().createJobQuery().processDefinitionKey("process_timer").processInstanceId(processInstanceId).timers()
    List<Job> jobs = jobQuery.list()
    assert jobs.size() == 1
    assert jobs[0].suspended == false

    then: "Suspend the process Definition without suspend the related process instances"
    repositoryService().suspendProcessDefinitionByKey("process_timer", false, null)

    and:"Job TIMER_JOB is not suspended"
    assert jobQuery.list()[0].suspended == false

}

def "Suspending the process definition without suspend the processes instances should suspend all instances job"() {

    when:"Start the process 'process_timer'"
    def processInstance = runtimeService().startProcessInstanceByKey("process_timer")
    def processInstanceId = processInstance.getProcessInstanceId()

    then:"Should not have any job"
    JobQuery jobQuery = managementService().createJobQuery().processDefinitionKey("process_timer").processInstanceId(processInstanceId).timers()
    List<Job> jobs = jobQuery.list()
    assert jobs.size() == 0

    then: "Suspend the process definition without suspend the related instances"
    repositoryService().suspendProcessDefinitionByKey("process_timer", false, null)

    and:"Complete task UT1"
    complete(task(processInstance))

    then: "Job should be suspended"
    assert jobQuery.list().size() == 1
    assert jobQuery.list()[0].suspended == true

}

def "Suspending process definition and processes instances and then reactivate instances should not reactivate job related to the instance"() {

    when:"Start the process"
    def processInstance = runtimeService().startProcessInstanceByKey("process_timer")
    def processInstanceId = processInstance.getProcessInstanceId()

    then:"Should not have any job"
    JobQuery jobQuery = managementService().createJobQuery().processDefinitionKey("process_timer").processInstanceId(processInstanceId).timers()
    List<Job> jobs = jobQuery.list()
    assert jobs.size() == 0

    then: "Suspend the process definition and related instances"
    repositoryService().suspendProcessDefinitionByKey("process_timer", true, null)

    then:"Reactivate instance"
    runtimeService().activateProcessInstanceById(processInstanceId)

    and:"Complete task UT1"
    complete(task(processInstance))

    then:"Job should still be suspended"
    assert jobQuery.list()[0].suspended == true

}

def "[FIX] Suspend the process definition without suspend related insatnces should not impact jobs of those instances"() {

    when:"Start the process"
    def processInstance = runtimeService().startProcessInstanceByKey("process_timer")
    def processInstanceId = processInstance.getProcessInstanceId()

    then:"Should not have any job"
    JobQuery jobQuery = managementService().createJobQuery().processDefinitionKey("process_timer").processInstanceId(processInstanceId).timers()
    List<Job> jobs = jobQuery.list()
    assert jobs.size() == 0

    then: "Suspend process definition without suspend process instances"
    repositoryService().suspendProcessDefinitionByKey("process_timer", false, null)

    then:"[FIX] activate all job definition related to the process definition"
    // false because if some instances are already suspended we don't want to re activate their job instances
    managementService().activateJobDefinitionByProcessDefinitionId(processInstance.processDefinitionId, false)

    and:"Complete task UT1"
    complete(task(processInstance))

    then:"job of the instance should be active"
    assert jobQuery.list()[0].suspended == false
}

The work around is to reactivate the jobs…

It would seem there is missing logic (and a cleanup job) in camunda to keep certain job definitions for specific process definitions active until the remaining process instances are completed. Otherwise it would be impossible for the Time to be executed in the above BPMN, as when the user task is completed the timer cannot be created because the process definition is suspended and therefore the job def is suspended…

The additional problem is that once the remaining process instances complete, the job definitions remain active. So you have to implement some sort of checking to remove these zombie job defs.