JSON History Provider (usecase: ElasticSearch indexing)


#1

Hey all

Lately we have been working on further enhancing the ability to generate ~realtime history event data into an index such as elastic search for time series, graphing, dashboard, etc.

In the past we have always done this through the rest-api as a pull into a DB. This has several issues because of the lack of ability to track what events have been updated since the last pull.

So we started working on a new Camunda plugin that that provides a custom history provider for exporting history events as JSON.

Specifically we export events into a rolling JSON file, and the plugin implements the composite history provider so the current DB events are still generated.

Config of the plugin looks like this:

            <property name="directory" value="."/>
            <property name="maxEventsPerFile" value="1000"/>
            <property name="logEventsToConsole" value="false"/>
            <property name="nullSubstitute" value="-"/>

if the nullSubstitute property is omitted then regular null values are used.

Something like this:

Then this data is picked up by “FileBeats” and sent into LogStash for processing, and sent into ElasticSearch.

FileBeats provides monitoring of the json file and ~“guaranteed delivery” of each file and each json object in the file(s) to Logstash. Logstash then cleans up the data, preforms some specific logic on which indexes to put different events, when to apply Insert or Updates for ES documents, and any specific conversions for process variables: example: JSON process variables are converted into actual json, so there may be further transformations applied. Or for example: FileBeats ignores fields thats are null, in our use case we want to import all fields event if they are null. So the Camunda Plugin replaces null values with a "-" and then logstash transforms those "-" values back into an actual null.

You can then look at data in kibana or access any sort of data through ES’s rest api.

There is currently the ES Cockpit plugin and engine plugin, but we found the engine plugin be a very complex implementation, so our implementation was designed to very simple, where json events are serialized and can be sent to another channel (json file on disk/volume, kafka, rabbit, etc).

So my question is if others have interest in this. Anyone have interesting use cases or similar needs?


Data Analytics for Camunda
Storing run-time audit information
#2

Additional Updates with further configurations and data massage for kibana usage.


Sequence Counter - History Events - How does counter incrementing work?
#3

Update:

Have added support for Event updates so that a single document represents current lifecycle state of the EventType.

This provides the ability to manage all events are single documents or have merged/aggregated documents/events :+1:


#4

Sample of variable data and process instance data:

The data for the pie chart, bar chart, and metrics are being pulled from variable data.


#5

For reference, this is the processing pipeline:


#6

Another use case enhancement: Machine Learning :tada:

Using the xPack unmonitored Machine Learning features.

With this there are a bunch of interesting use cases:

  1. Per Activity durations (if you are running scripts/delegates in your activity listeners you can monitor their performance)
  2. Watch for specific text in variables such as HTTP responses that might have extra data that is worry-some or un-usual (using rare analysis function)
  3. Just generate User Task performance (as shown in the image below). But you can take the same principal and apply it to multiple activity use cases.
  4. Duration of Timers/Jobs. If Jobs start to execute at unusual times (maybe a timer is set to wait 1h but you have job fatigue)
  5. Jobs being executed by specific nodes and other nodes not being used (could indicate a issue)
  6. Number of jobs being processed
  7. Number of Instances being processed
  8. Automated processes that are taking longer than normal (maybe due to increasing durations of tasks)
  9. with some structuring of json variables we could also generate a lot of unique reporting data for business operations. :wink:
  10. you get the idea.

If you have other ideas, please share.

In the example below, the data was loaded into camunda using https://github.com/camunda-consulting/camunda-util-demo-data-generator, with a 1 user task process. Was just a POC to get a feel for the capabilities and memory consumption of the elastic ML features.

What you are seeing in the graph is 30min time buckets of UserTasks that are at the Completed stage and graphing the durationInMills field. The analysis monitors for unusual Max values (https://www.elastic.co/guide/en/x-pack/current/ml-metric-functions.html#ml-metric-max)

for those interested here are some other interesting use cases:

  1. unusual process or activity activations
  2. usual process variable values
  3. http response codes that are unusual
  4. size of variables
  5. Camunda Incident types
  6. use log analysis
  7. User Task Reassignment
  8. User Task Delegation
  9. Number of User Tasks generated for a process
  10. For Multi-Instance: Large number of task instances being generated
  11. Business Metric reporting: Location of Processes, keywords, location categorization, etc
  12. Process Pausing/Suspension
  13. SLA durations being very long
  14. Messages being received from external systems
  15. External Task usage, locks, extended locks, failed locks, etc
  16. etc.

You can run the ML on a real-time analysis for continual stream, and then setup alerts so that you get altered about anomalies: you could even set it up to generate a Process instance (Start Instance or new Message Start Event) to investigate the anomaly :wink: