Mixed heterogeneous and homogeneous setup


#1

Hi, we currently have a homogenous setup in that we have one workflow api that is run as two replicas accessing a shared database. This api houses the process engine for all workflows. What we want to do is have each api run its own process engine instead of the single centralized one while still running multiple replicas of each. So its a combined heterogeneous and homogeneous setup.

Is this possible?
What is the impact of setting deployment aware on a setup with two (or more) replicas?

The setup with a shared database between all the api’s would be:
api-x-1: on startup deploy x’s workflows setting deployment aware
api-x-2: on startup deploy x’s workflows setting deployment aware

api-y-1: on startup deploy y’s workflows setting deployment aware
api-y-2: on startup deploy y’s workflows setting deployment aware

Thanks,
Andrew


#2

It seems to me that all you’d need is separate databases for each API “cluster”. There are no problems doing this (we do it) unless you don’t have the system resources to support it (space, memory, I/O, etc.).

When you configure the datasource for the process engine, just create two and point the process engine to the correct datasource.


#3

Thanks, yes that is an option. But it clashes with our other requirement that we have one view on all the running workflows. As far as I understand the setup you described would result in a web app/cockpit for each api cluster and there wouldn’t be an overall view.


#4

Another option could be to use the same database but with different schemas. i.e

apiXConfiguration.setDatabaseSchema(“X”)
apiXConfiguration.setDatabaseTablePrefix(“X.”)

From what I’ve read about the multi-tenant setup is that the web app allows for this and is able to switch between the tenants in the same app.


#5

If I understand what you’re doing right now, you’ve got a single database, but you’re using deployment awareness to keep execution of specific workflows on specific servers. If you are disciplined in maintaining that deployment strategy, then there’s no reason you can’t run a single back end database server many different process engines.

So, for example, you can have server A, C, and D running workflow 1, 4, and 6, while server B, E, F, and G are running only workflow 2, 3, and 5. You could still go to any server and you would see all the processes on servers A through G, but they would only be executing on the servers to which they were actually deployed.

If you wanted to know which processes were running on which servers, or to confirm that they are running, you can examine the ID of the process. This is a time-based UUID where the last element represents the MAC address of the server where the process was started.

Example: 77489fc1-b799-407f-91fd-ed758ff9187a

That last element, “ed758ff9187a”, is a MAC address which in theory can be tied to a specific server. You have to be care however as you can only reliably resolve these MAC addresses from a server on the same subnet as the other server. Routers, switches, etc. can assign a different MAC address.

I hope this helps.


#6

I’ve set each api to be deployment aware. I’m seeing some behaviour that I don’t understand.

api-X: deploys workflows X1, X2
api-Y: deploys workflows Y1, Y2

via api-Y I’m able to start a process X1. Does deployment aware only apply to the job executor? i.e it will only run jobs that apply to deployments it owns, but you can still correlate/start processes across deployments.


#7

Your understanding is correct. Whenever call an API it will synchronously run until the next wait state (async continuation, user task, …) is reached. In case of asynchronous continuation and deployment-aware job execution, this is effectively the point of handoff to the “right” cluster node.


#8

My understanding is that a particular process engine will only run a process declared as “deployment aware” if the process itself was actually deployed to the server via whatever mechanism you use (e.g. copying a war file to a deployment directory). Even though you might see a process listed on a server Cockpit, that particular server may not be able to execute any tasks for it.

The purpose of this is important. If your process requires external resources such as Java classes or Groovy scripts, then the server must have these first (or at the same time) for its tasks to execute on that specific Camunda instance (not cluster). This mechanism ensures that if you are using a deployment mechanism where BPMN code is sent to the server without be packaged with necessary external resources, that you can “prepare” the server beforehand so that when the deployment of the BPMN code is done, everything is ready and the instance can start processing tasks for that.

Correlation is a database operation and so while they’re still technically sharing the same database, finding a specific process would require that you also know the server upon which it may be run. I don’t really know as we do not permit the use of Camunda correlation because it’s so costly. I suppose if you send a message, the intelligence of the various process engines would figure out which is actually running and only “route” the message to it.