Sponsor wanted: Camunda Custom Batch Extension

Hi,

after our previous question in Custom batch job using camunda-batch, it was a lot of work to get our custom batch implementation runnable. We had to write a lot of code on Entity level and so, which was very time consuming and complex. Also a deep knowledge of the batch API was needed.

Furthermore we have a lot more cases where we want to use Camunda Batch, so I had the idea to build a community extension for this. This extension should make it easier to get into using Camunda Batch.

I’ve a lots more ideas like making it possible to provide a Data Collector Class instead of the data itself. Or providing a possibility to automatically trigger the creation of a new batch with a timer job.

I already have a runnable implementation, but not on Github. I will push it in about 2 weeks to Github. (Because of Holiday) But I already want to promote it.

So is there anybody who wants to sponsor me to get it an official extension? :slight_smile:


Here are some example how to use the extension:

Creating the Batch itself

In the minimum setting (with default batch values and configuration from Context)

 final Batch batch = CustomBatchBuilder.of(data) #List of Objects which should be processed
        .jobHandler(testCustomBatchJobHandler)
        .create(); 

or with more Configuration:

 final Batch batch = CustomBatchBuilder.of(data)
        .configuration(configuration)
        .jobHandler(testCustomBatchJobHandler)
        .jobsPerSeed(10)
        .invocationsPerBatchJob(5)
        .create(configuration.getCommandExecutorTxRequired());

The builder takes care about:

  • Creating the Batch (Entity)
  • Creating the Seed and Monitor Job
  • Saves the BatchConfiguration data, no matter of the data type
  • Saves you from handling with ConfigurationBytes

The JobHandler

I also created an abstract job handler which you just have to extend and write your Business Logic.
The Abstract Job Handler takes care about:

  • Creating Jobs + saving Configuration to ByteArray Table
  • Reading data from ByteArray Table for each Batch Jobs
  • Cleanup of Jobs + Configuration

In the end, with the abstract class, my job handler looks like:

public class TestCustomBatchJobHandler extends CustomBatchJobHandler<String> {

  @Override
  public void execute(List<String> data, CommandContext commandContext) {
      logger.debug("Work on data", data);
  }

  @Override
  public String getType() {
      return "test-type";
  }
}

Best regards,
Patrick

2 Likes

Hi Patrick,

This sounds great, I would like to support you with the extension and guide you through the process of publishing it.

Cheers,
Askar

4 Likes

Hi Patrick

A few days ago I raised the question about using your Camunda batch extension to implement a Batch Order Process pattern.
Please see Approach to batch processing

It looks good but it requires the list to be already created.
Really what I’d like is a way to build the list and at some point submit / create the batch.

I can, of course, do this with the existing in-memory, with some extra logic to handle a crash, so the list can be recreated etc.

It would be really nice if the extension supported an item add command, that I could then create as this would remove the need for extra logic in the flow.

From reading your roadmap this may already been in your mind, however looking for suggestions on how to extend the plugin for above?

Ref: https://github.com/camunda/camunda-bpm-custom-batch
Regards
Tom.

Hi Tom,

I already use the extension at my current customer for use cases like yours, so I think using camunda batch would be a good choice.

Just that I understand you correctly, in the first step (e.g. the first delegate) you just want to add batch data?
And later in a second step / delegate you want to create the batch?

And you don’t want to store this information in a process variable?

Best regards,
Patrick

Hi Patrick

Great to know the extension is already in use by customers.

Yes the first delegate takes say an Order Id, contained in a process variable, and would add it to the list, as the orders arrive. Then Camunda batch would be triggered to send the batches to a service.

Probably goes without saying that data loss is unacceptable.
My concern is if a Camunda instance crashed, the 2nd delegate, I’d lose the in-memory list. Then I’d need to either run a reconciliation or possibly have a correlation operation built into the flow, for all submitted orders. Then I could implement an appropriate action to resubmit the unprocessed orders.

If I were adding each Order Id, I assume this would be inserted into the DB and would survive an instance crash. I have not looked into the costs associated with the serialization into the batch job in the DB.

I’m still capturing NFRs but looking at potentially pushing 10s of thousands of instances through Camunda. I can scale across many Camunda instances but the DB is still a point of contention.

Appreciate you time and guidance.

Regards
Tom.

Hi Tom,

ok so you have a lot of order processes, which should always put one order id into the item list.
And this list should then be processed by the batch, where the batch itself will be created e.g. daily? (like in your example)

I’m not 100% sure what you mean with the in-memory list.

We have a similar scenario and we solved this by creating an own event table. Each order id will be stored as an event in this table, and after some amount of time, a process runs which collects all events and creates a camunda batch out of it. With this pattern you will have no problems with concurrent modification if there are a lots of orders, and it’s also fail save if some camunda instance crashes. (We additionally have an own camunda cluster which just works on those batch stuff)

Of course it would also be possible to extend the camunda batch extension in a way that you can first “prepare” a batch which could get filled by different processes. But I’m afraid that this can cause problems if there are a lot of (concurrent) modifications, because the batch data list is just a serialized java list in the ACT_GE_BYTEARRAY and I’ve always to deserialize, modify and serialize the list if I want to add an item. Currently I see no better way to handle this within the extension itself.

Regards,
Patrick

Thanks Patrick.
Finally understood from a concurrent thread on the batch pattern please see Approach to batch processing

No need for an event table as I can query the active process instances.
No need to serialise / deserialise with this approach.

Thank you again.

Regards
Tom.

1 Like