ENGINE-16006 BPMN Stack Trace

Arsene_Lee · July 7, 2020, 7:08pm

I’m using Camunda with Springboot in a K8s 3 pods env. I’m using Postgres DB as DB.

I have a BPMN workflow. During the execution, the process engine threw the ENGINE-16006 BPMN exceptions. However, the flow seemed to continue to finish without any real impact.

I couldn’t find any related discussion on ENGINE-16006 BPMN error code.

Is this something I should need to worry about? Or I can just ignore the error?

ENGINE-16006 BPMN Stack Trace:
Task_1q1er1j (transition-notify-listener-take, ScopeExecution[27ef6b53-b9c1-11ea-b4bb-7a312e676758], pa=migrationApplication)
Task_1q1er1j, name=Export Data from Mongo

o.c.b.e.e.NullValueException: Cannot find execution with id ‘1331c9d2-b9c4-11ea-b4bb-7a312e676758’ referenced from job ‘MessageEntity[repeat=null, id=576d7b6d-b9c5-11ea-b4bb-7a312e676758, revision=2, duedate=null, lockOwner=0f48ccab-e461-40b9-ad72-5555de076fee, lockExpirationTime=Mon Jun 29 05:09:56 GMT 2020, executionId=1331c9d2-b9c4-11ea-b4bb-7a312e676758, processInstanceId=12fdc238-b9c4-11ea-b4bb-7a312e676758, isExclusive=true, retries=1, jobHandlerType=async-continuation, jobHandlerConfiguration=transition-notify-listener-take$SequenceFlow_0gps8w9, exceptionByteArray=null, exceptionByteArrayId=null, exceptionMessage=null, deploymentId=7b35d6bf-ab84-11ea-8023-d2037c48cdc9]’: execution is null
at s.r.GeneratedConstructorAccessor249.newInstance(Unknown Source)
at s.r.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at j.l.r.Constructor.newInstance(Constructor.java:423)
at o.c.b.e.i.u.EnsureUtil.generateException(EnsureUtil.java:381)
at o.c.b.e.i.u.EnsureUtil.ensureNotNull(EnsureUtil.java:55)
at o.c.b.e.i.u.EnsureUtil.ensureNotNull(EnsureUtil.java:50)
at o.c.b.e.i.p.e.JobEntity.execute(JobEntity.java:120)
at o.c.b.e.i.c.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:109)
at o.c.b.e.i.c.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:42)
at o.c.b.e.i.i.CommandExecutorImpl.execute(CommandExecutorImpl.java:28)
at o.c.b.e.i.i.CommandContextInterceptor.execute(CommandContextInterceptor.java:107)
at o.c.b.e.s.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:46)
at o.s.t.s.TransactionTemplate.execute(TransactionTemplate.java:140)
at o.c.b.e.s.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:44)
at o.c.b.e.i.i.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:70)
at o.c.b.e.i.i.LogInterceptor.execute(LogInterceptor.java:33)
at o.c.b.e.i.j.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:51)
at o.c.b.e.i.j.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:44)
at o.c.b.e.i.j.Execu…

lucas.silva · July 8, 2020, 4:06pm

Hi,

It seems to me that some job (async-continuation) is being executed by more than one of your pods. Maybe the lock duration of the job executor is not being long enough and another pod with the job executor is trying to execute this task. Once a job is executed, the corresponding execution is removed from the Runtime Database (ACT_RU_EXECUTION).

You will need to find the task associated with this. Maybe going into the table ACT_HI_JOB_LOG and do a query (WHERE EXECUTION_ID_ = “1331c9d2-b9c4-11ea-b4bb-7a312e676758”). Then I’d take a look at how much time did the task take to be done (by subtracting the timestamp of the record with job_state 2 and the one with job_state 0, if there were no retries in between you should have two records).

If this is happening, either make the task run faster or increase the locking time. The locking configuration is camunda.bpm.job-execution.lock-time-in-millis. More info here: Process Engine Configuration | docs.camunda.org

It shouldn’t be a problem if the task execution generates no side effect (ex: you call an idempotent API), but let’s say you are calling an API that do a HTTP POST (which creates a record somewhere else). The latter scenario will mean that you will POST against the API twice, maybe creating two records in another application.

Regards.

Arsene_Lee · July 8, 2020, 9:45pm

Hi Lucas,

Thank you for your reply.

I checked the database. There were many records when I queried by WHERE EXECUTION_ID_ = “1331c9d2-b9c4-11ea-b4bb-7a312e676758”. Does it mean there were many retries?

The default value for .lock-time-in-millis is 300000 that is 5 mins. If the job takes longer than 5 mins then the retries will kick in then we might see this exception?

I can try to extend to timeout. By comparing the smallest timestamp and the largest timestamp, the gap is around 15 mins so I’ll probably need to set a value larger than 15 mins to avoid this issue?

Best,

Arsene Lee

I attached the csv file.
data-1594244000426.csv (47.1 KB)

system · January 30, 2024, 12:30pm