Transaction timeout on long running tasks fails before lockTimeInMillis

I’m trying to understand how the job executor works with lockTimeInMillis by creating a long running service task (an ssh task which executes a python script which sleeps for 420S)

Below are the configurations:

For <subsystem xmlns="urn:jboss:domain:transactions:1.5">
<coordinator-environment default-timeout="900"/>

For <job-executor>
<property name="lockTimeInMillis">
    60000
</property>
<property name="waitTimeInMillis">
    5000
</property>
<property name="maxJobsPerAcquisition">
    3
</property>

When I executed the bpmn diagram, I see that task has been retried many times before failing with transaction timeout. Transaction timeout happened exactly after 7mins i.e. after my service task is completed executing. I was expecting task would not fail as lockTimeInMillis is 10M and default-timeout is 900S(15M). What is the reason behind this behavior, am I missing some configuration here?

2019-03-08 20:37:25,157 INFO [com.test.ssh] (job-executor-tp-threads - 21) Executing SSH task…
2019-03-08 20:38:40,197 INFO [com.test.ssh] (job-executor-tp-threads - 22) Executing SSH task…
2019-03-08 20:40:40,180 INFO [com.test.ssh] (job-executor-tp-threads - 23) Executing SSH task…
2019-03-08 20:42:25,142 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff83a0a226:122c8e84:5c802307:886c7 in state RUN
2019-03-08 20:42:25,143 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff83a0a226:122c8e84:5c802307:886c7 invoked while multiple threads active within it.
2019-03-08 20:42:25,143 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffff83a0a226:122c8e84:5c802307:886c7 aborting with 1 threads active!
2019-03-08 20:42:25,144 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffff83a0a226:122c8e84:5c802307:886c7
2019-03-08 20:42:40,182 INFO [com.test.ssh] (job-executor-tp-threads - 24) Executing SSH task…
2019-03-08 20:43:40,169 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff83a0a226:122c8e84:5c802307:88818 in state RUN
2019-03-08 20:43:40,170 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff83a0a226:122c8e84:5c802307:88818 invoked while multiple threads active within it.
2019-03-08 20:43:40,170 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffff83a0a226:122c8e84:5c802307:88818 aborting with 1 threads active!
2019-03-08 20:43:40,172 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffff83a0a226:122c8e84:5c802307:88818
2019-03-08 20:44:25,374 INFO [com.test.ssh] (job-executor-tp-threads - 21) Closing the input stream
2019-03-08 20:44:25,374 INFO [com.test.ssh] (job-executor-tp-threads - 21) closeConnection():: Exiting the channel and closing the session…
2019-03-08 20:44:25,414 ERROR [org.hornetq.ra] (job-executor-tp-threads - 21) HQ154002: Could not create session: javax.resource.ResourceException: IJ000460: Error checking for a transaction
at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.getManagedConnection(TxConnectionManagerImpl.java:362)
at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:499)
at org.hornetq.ra.HornetQRASessionFactoryImpl.allocateConnection(HornetQRASessionFactoryImpl.java:832)
at org.hornetq.ra.HornetQRASessionFactoryImpl.createSession(HornetQRASessionFactoryImpl.java:465)
Caused by: javax.resource.ResourceException: IJ000459: Transaction is not active: tx=TransactionImple < ac, BasicAction: 0:ffff83a0a226:122c8e84:5c802307:886c7 status: ActionStatus.ABORTED >
at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.getManagedConnection(TxConnectionManagerImpl.java:352)
… 210 more

Hi @bpmlearner,

If you want to set the lock time to 10m you should configure 600000.
Your configured 1m :slight_smile:
The default lock time of the Engine is 300000 which is 5m.

Regards,
Dominik

@dominikh Thank you, that was a good catch. I increased it to 10M (from 1M) and now the task goes for completion.

I have changed to below configuration to understand the link between default-timeout on jboss transactions subsystem and lockTimeInMillis of job-executor.

<coordinator-environment default-timeout="300">
with
<property name="lockTimeInMillis"> 600000 </property>

i.e. Task to execute it for 7M (420S on my python script), where lockTimeInMillis is 10M (600000) and coordinator-environment default-timeout is 5M (300S). I was expecting JBOSS transaction timeout to happen after 5M, but JBOSS did not complaint anything about transaction failure and the execution goes for completion without any issue.

Why i did not get transaction timeout ? Also this post says that is it safer to configure lockTimeInMillis lesser than the transaction timeout.

@dominikh could you please answer my query if you have any inputs on that?

Hi @bpmlearner,

In my point of view the job transaction should get marked from JBOSS for rollback after 5 min.
But it won’t get interrupted - instead the commit should not work, later.
Therefore it is safer to configure the lockTimeInMillis lesser than the default-timeout. Because in theory it fails while commiting if the transaction took longer than the default-timeout. But I don’t know if the configuration of your JBOSS is fine - I am not a JBOSS expert…

Regards,
Dominik

Hi @dominikh,

how should one implement long running process steps in Camunda? If I configure lockTimeInMillis lesser than the transaction time out of 5 minutes in Wildfly my process step can only last 5 minutes until it fails because of transaction-timeouts. I am trying to perform data operations which take several hours what is the best way to design such a process in Camunda? Should I use external tasks for these requirements?

kind regards,
Jens

Hi @Jens_Stahl ,

You should definitely don’t do it synchronously.
External Tasks fit your requirements very well. An External Worker can fetch a task and lock it. The lock can then be extended every x minutes. This way it is detected if a worker is still active or failed. If the operations take so long, you can also think about splitting them into separate parts and parallelize them through multiple external tasks.

regards
Dominik

2 Likes

Thanks @dominikh! Even though this changes our architecture a lot it makes sense to use external tasks then.
“you should definitely don’t do it synchronously”: I also see the option
<bpmn:serviceTask camunda:asyncBefore=“true” for a service task.
But this propably does not solve the transaction timeout problem I guess?

Hi @Jens_Stahl
It is good for such long tasks not to block engine threads unnecessarily. So I would really recommend you to do that with external tasks. If it helps your architecture, you could also think about a send/receive pattern. You could model this with a ServicesTask and a ReceiveTask. This can be helpful if you use e.g. an event broker like Kafka.

AsyncBefore does not help you. This controls a different behavior of the engine.
The engine works in transactions. From wait state to wait state. By default, not every element is a transaction. Some elements have transaction points by default (Message, UserTask, Timer, Conditional, …) All elements the engine has to “wait” for something.

For example, if 10 ServiceTasks are connected in series, the engine processes all tasks in one transaction because it does not have to wait for something (this does not count for external tasks). This means if the last one fails, it has to start from the beginning. With AsyncBefore / After you can control the wait states manually. There are also some best practices → Camunda Best Practices I would highly recommend you to read :slight_smile:
Transactions in Processes | docs.camunda.org

1 Like

Great answer thank you!