How to use transactional annotation in spring boot with embedded Camunda engine

Patrick · November 10, 2021, 12:55pm

Hi,

I am setting up a spring boot application with an embedded Camunda process engine and want to store my application state in the same database as the camunda tables are, such that I can leverage an atomic transaction over the updates to the camunda tables and my application state. Keeping the application state and its process state consistent.

What I notice is that when the database is under some load, the models are not in sync anymore. The camunda process progresses faster than my application logic, leading to inconsistent states because mandatory data is not available in the application state.
So it seems they use different transactions for their updates.

I suppose the resolution is in setting up a shared platform transaction manager, but I am not sure.

Can you share the best practices for setting up correct transactional behaviour over a single datasource for both camunda updates and my application’s state updates?

Moreover, should I decorate all my JavaDelegate.execute methodes with @Transactional, as well as the REST endpoints that are provided and use the runtimeService to interact with the process model, or is this not necessary?

Hope you can share the best practices for this. I found some posts on using two distinct datasources, so that is a different setup. I want to use one datasource and one transaction to control the persistence operations from camunda as well as from my own application.
It seems to keep things simple and consistent. However, it requires some additional configuration and I would like to know how to set this up.

Best regards,
Patrick

Ingo_Richtsmeier · November 10, 2021, 3:26pm

Hi @Patrick,

a good start to get further information is here: Transactions in Processes | docs.camunda.org.

There are links into transaction integration in Spring included: Spring Transaction Integration | docs.camunda.org

Hope this helps, Ingo

Patrick · November 12, 2021, 12:00pm

Hi @Ingo_Richtsmeier ,
Thanks for your reply.

I have been looking into this further and following the hints in the referred documents does not help me further, since I am using JPA to manage my application state. So the autoconfigured JpaTransactionManager by spring boot works better.

I have also see that the delegate is executed in a transaction. Setting the propagation level to NEVER gives an error. Therefore I would expect that all other calls are part of that same transaction and therefore no additional @Transactional annotations would be necessary for calls initiated by the JavaDelegate.
In my REST controller or other alternative starting points, I should of course place a @Transactional attribute.

Still I have this behaviour of inconsistency sometimes.

What may be messing things up is that I am throwing a BpmnError; in response to a RequestNotPermitted exception from a RateLimiter. That would however not explain the inconsistencies I witnessed sometimes.

Do you have any clues where to look further?

Best regards,
Patrick

Patrick · November 15, 2021, 9:50am

Hi @Ingo_Richtsmeier ,
I have an update after experimenting over the weekend.

What I noticed, is that my postgresql server is hitting 100% IO usage. Connections drop and in some cases rollbacks are interrupted by an exception. I think this is what caused the database to get into this inconsistent state.

What I tried is to minimize the number of savepoints and I set the history level to NONE to decrease the IO rate.

That seems to have solved the issue. The IO metrics are more reasonable now and I have not seen an error since. I must admit that switching history to FULL does not reproduce the same behaviour. But IO still peaks to 90% which I consider too high as well.

On the spring integration with my JPA classes I am still not sure what is necessary. Perhaps it is not needed but I think I will annotate the JPA database access with a @Transactional and propagation MANDATORY to be sure.

The issue is that we start this particular process in a bulk (it is part of the end of month processing and we start about 20.000 instances of this process from a given point in time) and that the REST services that are invoked cannot handle a lot of load. So I need to throttle the calls. The external task pattern is most suited for this situation but comes with its own issues so I now implemented a rate limiter on the outbound calls; if the rate is exceeded I throw a BpmnError to wait a minute before conducting a retry. I run in a setup with max 4x10 job executors and a (assumed) limit of 4 calls per second to a given service.
So quite some administration is going on dealing with all the not-allowed calls. That in combination with maintaining history causes the postgresql server to reach 100% IO.

I will post another item on dealing with the external task workers. To see if that would be a viable solution as well.

Best regards,
Patrick

Ingo_Richtsmeier · November 15, 2021, 11:03am

Hi @Patrick,

What do you face here?

this can be done by the engine itself.

If you use a Java delegate implementation, just throw a RuntimeException in your delegate code and mark the task with async before in the modeler. You have to add a retry time cycle in the modeler until the task will be executed again: The Job Executor | docs.camunda.org

In an external service task, respond with handleFailure in and provide a number of retries and a retry interval with a reasonable amount of time before the task should be fetched again: External Tasks | docs.camunda.org

Hope this helps, Ingo

Patrick · November 15, 2021, 11:50am

Hi @Ingo_Richtsmeier ,
Thanks again

Regarding the external task pattern. See: https://forum.camunda.io/t/best-practices-for-updating-central-state-from-external-task-workers/31286
Though correlated in my use case, it seemed like a different topic to me. That’s why I posted it in a separate item.

For the second part. I know that it could also be done with the regular retry cycle, but I hoped to visualize the retries. So that when we could use optimize, we can visualize the impact of the QoS of the REST service. And that may help in getting this REST service scaled to a higher throughput.
I did not realize that the current setup causes more database interaction. So in that sense it may also be helpful to use that approach.

What would be really great, is if I could configure the maximum number of concurrent executions of an activity in the cluster. To my knowledge that is not possible. I also studied the code like the “selectNextJobsToExecute” query to see if I could propose some extension there, but it seems hard at first sight because I could not find a notion of a configuration per activity in the schema.

Doing this at process level would be easier to accomplish, but then it would throttle the whole process and the purpose is to throttle an activity to match the QoS of an external service that is invoked from the activity.

Do you know if this is currently supported (and I just missed it) or something that is being worked on for a subsequent release?

Patrick

fml2 · November 16, 2021, 8:27am

IMO, by configuring the job executor properly, you can also avoid peaks in the processing.

This would make the process model and its execution unnecessary complicated IMO. I’d search for another solution. The fact that you have to resort to such tricks is a general architecture smell for me.

Patrick · November 17, 2021, 9:19am

Hello @fml2 ,
Thanks for your contribution.

I am curious on how to configure the job executor to avoid peaks in the processing. Hope you can elaborate on that.
What I can imagine is that I set the number of job executors to a low value to match the weakest REST service involved. That would also mean that the whole process is limited by this, where I would rather only have this limitation on the activity that invokes this REST service.
However, if it is possible to configure the job executor such that a given activity should not have more than N instances at the same time, I will be happy to use that approach.

For the second part I agree it is a smell. And actually independent from the process design and implementation, the issue is that we need to call a REST service about 20.000 times where this service is scaled to be used at a much lower value.
So it would be better if there would be a time decoupling between the process and this REST endpoint, for example by asynchronous messaging in between. Here the requests could be buffered and processed at the pace of the supplying service. This could be done using for example JMS, kafka, or external task workers.
However, that would make the overall architecture more complex and it brings another question how to provide the response to the process. When using an external messaging provider (JMS, Kafka etc) I’d have to clutter my process design with handling the response of those services and when using external task workers there is also a question on how to store the updates to the application state. (That part I posted in another item).

Generally my opinion is that in those cases you should actually time-decouple the consumer and provider, but I am hoping there is a simpler way to control the service invocations from my process. Like a configuration on the job executor that I am currently not aware of.

Best regards,
Patrick

fml2 · November 17, 2021, 4:10pm

Sorry, there is no magic here. Actually, I thought of the thing you described, i.e. configuring the thread pool of the executor. IMO it’s not possible to configure the job manager at the job level.

As for the tx management: IIUC, camunda just uses the spring infrastructure for this. I.e. if you have just one underlying DB then spring should correctly drive the transactions – even if you use JPA for the DB access (camunda uses another framework for this).

To take the load off of the service, you could for example put the 20.000 orders into a DB table and start a process that would process them with a moderate speed – e.g. by putting a timer event with a delay into the process loop. Or, as you wrote, use external tasks. But if this would be the only place where external tasks are used then I’d probably not go along this path.

system · January 30, 2024, 1:32pm