Multi-instance External Tasks are getting stuck


#1

Hi, I am struggling with Multi-instance External Tasks. I created a simplest bpmn to show the problem.

07

The Task is External, Loop Cardinality is fixed number (e.g. 100), and has

  • Multi Instance Async Before - true

  • Multi Instance Async After - true

  • Multi Instance Exclusive - true

  • Async Before - true

  • Async After - true

  • Exclusive - true

Then I have a External Task Poller, which is for sake of simplicity polling tasks synchronously in a for loop, logging and completing.

@Scheduled(
        fixedDelayString = "${externaltask.worker.poll.rate}"
)
public void poll() {
    List<LockedExternalTask> tasks = externalTaskService.fetchAndLock(10, externalTaskConfiguration.getWorkerId())
            .topic(topic.getName(), externalTaskConfiguration.getDefaultLockDuration())
            .execute();

    tasks.forEach(task ->  {
                try {
                    log.info("-----------------------------------Executing task: {}", task.getId());
                } catch (Exception e) {
                    log.error("failed to process external task - {}", task.getId(), e);
                }
                externalTaskService.complete(task.getId(), task.getWorkerId(), task.getVariables());
                System.out.println("------------------------------------------------completed: " + task.getId());
            }
    );
}

I am getting following exception

org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'UPDATE VariableInstanceEntity[99e059c5-26a0-11e8-b347-aa5d7001bc63]' failed. Entity was updated by another transaction concurrently.
at org.camunda.bpm.engine.impl.db.EnginePersistenceLogger.concurrentUpdateDbEntityException(EnginePersistenceLogger.java:130)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.handleOptimisticLockingException(DbEntityManager.java:406)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.checkFlushResults(DbEntityManager.java:365)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperations(DbEntityManager.java:345)

In Camunda Cockpit I can see that nrOfActiveInstances are not 0 after quite long time, even though all tasks were polled and completed.

  • nrOfInstances - 100
  • nrOfCompletedInstances - 64
  • nrOfActiveInstances - 36

And the process would stuck in this state forever… :confused:

I have also tried handleFailure method in ExternalTaskService, the situation didn’t chnage much.

Any help, hint, suggestion will be much appreciated. Thanks in advance.


#2

Hi @gohar.gasparyan,

I tried to reproduce the issue in a unit tests but it works as expected.

Do you know which variable is modified concurrently?
How long do you look an external task?
Do you see errors when calling complete(...)?

Best regards,
Philipp


#3

I’d assume that

externalTaskService.complete(task.getId(), task.getWorkerId(), task.getVariables());

can lead to the optimistic locking exception.

If external-tasks A reports back e.g.
“variables”:
{“aVariable”: {“value”: “aStringValue”} }
and external-task B reports back:
“variables”:
{“aVariable”: {“value”: “aDifferentStringValue”} }

can this lead to optimistic locking exceptions since Camunda tries to merge the results for the “owning”-process?

So is it possible/advisable at all to return variables with the same name at the complete-method-call?


#4

Hi @mase,

an OLE can always happen when a process instance is updated by multiple transactions. So if you call complete(...) concurrently then an OLE may occur.

If the variable is local (in the scope of the execution) then it’s ok. Otherwise, you would override the variable with the new call which is maybe not what you want.

Does this help you?

Best regards,
Philipp


#5

We had the same issue (multi-instance subprocess stuck) and found that the cause was that we were overwriting the loop control variables. Instead of calling complete with task.getVariables() as argument, try providing only those variables that you want to change.

-Hans