Hi,
I have a process engine in version 7.4 with a job executor running on a PostgreSQL and a custom application that uses the process engine to generate additional processes from a BPMN XML template to allow users to schedule pre-configured reports being generated and sent out to them at a specific time per day.
Some time ago, we ran into a race condition with the process generation logic that resulted in an OutOfMemoryError due to the DeploymentCache filling up (the race condition was a new deployment for a changed generated process in combination with a process definition activation in the same transaction, which fails optimistically in an endless loop).
As a resulting symptom, existing scheduled processes using the timer start event tried to start but often failed to do so and by themselves ran into an OOM. This seems to have resulted into an additional timer being inserted into the database at timer fire time but the old one not being deleted. This effect seemed to have gotten worse by retries and also lock expiration (in the history I can see several process instances started by a 5-minute delay, which is the configured job lock expiration time).
After searching a bit, I found the following bug entry which supposedly should have fixed that issue:
https://app.camunda.com/jira/browse/CAM-2797
However, in my case it didn’t for some reason. I also took a deeper look into the timer-start-event handling code but found no obvious race condition there which would have explained the symptoms.
As a last resort, I wanted to make sure from a database schema perspective that a situation like this simply cannot occur and introduce an appropriate partial unique index to the “act_ru_job” table. But due to the hard-coded statement re-ordering by the DbOperationManager, the index fails because the timer for the new entry is being inserted before the old one is deleted.
Of course, an OOM is a serious issue and there is not much we can do inside the JVM in such case, which is why I want to ensure consistency at the database level.