Exception during batch execution

Hi guys,

I got following excepting during batch execution with camunda.

org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'DELETE MessageEntity[5753afca-0d01-11e7-8ca6-005056a16e97]' failed. Entity was updated by another transaction concurrently.
	at org.camunda.bpm.engine.impl.db.EnginePersistenceLogger.concurrentUpdateDbEntityException(EnginePersistenceLogger.java:125)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.handleOptimisticLockingException(DbEntityManager.java:323)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperationManager(DbEntityManager.java:295)
	at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flush(DbEntityManager.java:278)
	at org.camunda.bpm.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:247)
	at org.camunda.bpm.engine.impl.interceptor.CommandContext.close(CommandContext.java:176)
	at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:113)
	at org.camunda.bpm.engine.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:42)
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
	at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:40)
	at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:66)
	at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
	at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:35)
	at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:28)
	at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.executeJob(ExecuteJobsRunnable.java:82)
	at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.run(ExecuteJobsRunnable.java:56)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Some notes to my infrastructure:

I have a homogenous cluster with 4 nodes.
All nodes running a embedded process engine with job executor.
All nodes have same db.
All job executors are configured to only use 1 thread.
Camunda 7.6.0 is used.

Im not sure how this OptimisiticLockingException can occur when trying to delete a MessageEntity(Job).
The message entity seems to be the actual executed job from batch.

If locking of jobs works fine in my opinion only one thread will execute a job at one point in time with described configuration.
So there could not occur such a exception or am I wrong?
The invocationsPerBatchJob Parameter is 50 .

Maybe you have some hints.

Best regards,

Markus

Hi @Markus,

there is a bug in the implementation https://app.camunda.com/jira/browse/CAM-7519, unfortunately I cannot tell you when it will be fixed. In general that means that entity was already deleted by another job executor.

Cheers,
Askar

Hi @aakhmerov

Thanks for reply.
I am not sure if you misunderstand me.

Is there a general bug in camunda batch concept that execution jobs of batches are executed on different nodes
although they are locked by one node in db?
Or does this bug only concern delete instance batches?

It seems to me that the ticket issues a deploymentAware cluster.
My cluster is homogenous and the problem is that a OptimisticLockingException occurs but the job should only be executed by one thread.
So I do not understand why this exception occurs.

Best regards,

Markus Hens

Hi @Markus,

there is a general missing feature\bug that if your job executor is deployment aware it will not use split in deployment groups during batch operations, except for migration. I am not sure this is your case, but your exception means that while one job executor tried to delete some entities another one already deleted them. Which in general is not a problem for you, you can ignore this stack trace, since entity gets deleted in any case.

Does that help?
Askar.

Hi @aakhmerov

My job executor is NOT deployment aware.

I will describe my problem a bit more.

We use Camunda batch to execute expressions. Therefore we implemented a simple service which registers a batch. Job handler for execution jobs only executes the expression.
That works fine.
In concrete example the expression does not do anything with processes so camunda schema is not involved.
But of course the job has to be delete if it is finished. This is completely managed by camunda batch.

The deletion of the job results in OptimisticLockingException which leads to an “reexecution”. The expression executed by the job handler sends emails. So this is done more than once because it can not be rolled back by the transaction. That is the reason why I have to analyse it.

In my understanding this exception can only occur if an other thread has already deleted the MessageEntity.
And that is the problem. Every node is single threaded and locking of jobs will ensure so that no other node will execute the job when it is locked.

So I do not understand why this exception is thrown.
Is it a bit more clear?

Best regards,

Markus

Hi @Markus,

are there sub processes among the processes that are scheduled for deletion?

Cheers,
Askar.

Hi @aakhmerov

I do not use process instance deletion.
So no there are no sub processes.

The batch has a seed job, a monitor job and N execution jobs.
The deletion of ONE execution jobs fails when it is completed and have done its work.

So it seems to me that it is executed more than once.

Hi @Markus,

I am a bit lost, can we start from the beginning here? :slight_smile: Which batch is getting executed?

Cheers,
Askar

Hi @aakhmerov

Of course we can :slight_smile:

We implemented a own feature based on Camunda Batch.
This feature is able to load a list of entityIds from our domain database and execute a JuEl expression for each of them.

So we do not use a implemented camunda batch but the general batch concept to split work in execution jobs.

When such a batch is executed the execution of expression works fine.

I think if a ‘execution job’ completes camunda tries to delete it from database(Maybe done by JobExecutor?). Here a OptimistikLockingException occures.
So it seems to me that an other thread has also executed the same job.
BUT: the JobExecutor is configured to be single threaded. And in my opinion the LOCK in ACT_RU_JOB should avoid that other nodes execute the job.

Is it a bit more clear?

Hi @Markus,

yes, now I understand. Job is deleted after execution if there was no incident. Could you enable debug logging in order to try to understand better what is happening?

Cheers,
Askar

Hi @aakhmerov

Yes I will enable debug and try to reproduce the problem .
If I got some new information I will post it in this thread.

Thanks

Markus