Error on async after

Hi all,

I’m currently experiencing an error that only occurs sometimes and haven’t been able to replicate outside of our full flow diagram. Essentially we are getting the error below which causes the process to stop for awhile (noticed stops between 30mins to 1hr) and then the process seems to continue with no errors and no external intervention to get it going again.

java.lang.NullPointerException: null
at org.camunda.bpm.engine.impl.pvm.runtime.LegacyBehavior.isAsync(LegacyBehavior.java:537) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.pvm.runtime.LegacyBehavior.repairMultiInstanceAsyncJob(LegacyBehavior.java:566) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.AsyncContinuationJobHandler.execute(AsyncContinuationJobHandler.java:63) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.AsyncContinuationJobHandler.execute(AsyncContinuationJobHandler.java:36) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.persistence.entity.JobEntity.execute(JobEntity.java:132) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.cmd.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:99) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.cmd.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:36) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.CommandExecutorImpl.execute(CommandExecutorImpl.java:24) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:104) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:42) [camunda-engine-spring-7.9.0.jar!/:7.9.0]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) [spring-tx-5.0.6.RELEASE.jar!/:5.0.6.RELEASE]
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:40) [camunda-engine-spring-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:66) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:36) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:29) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.executeJob(ExecuteJobsRunnable.java:88) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.run(ExecuteJobsRunnable.java:57) [camunda-engine-7.9.0.jar!/:7.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

Below is a stripped down version of the process and the error occurs on the Ack Received message event. (All of the purple element have async after set on them)

Any ideas on why this would be happening?

Thanks,

Matt

Hi @matt,

I don’t know why this NPE happens. Can you provide your example as failing test case? It would help to analyze the issue.

If this issue blocks you then you could replace the AsyncContinuationJobHandler job handler by a custom one.

Best regards,
Philipp

Hi @Philipp_Ossler,

Thanks for the reply, we will try to create an example test case, but we are unsure of the scenario in which this error happens at the moment.

Hi @matt

I am facing same exception. Any luck with solution?

For me, issue happens only for some process out of 1000. and when I start workflow instances in bulk

HI @Philipp_Ossler, I am facing exact same exception, it is happening randomly at two-three particular service tasks after sub-process. I am using camunda 7.9.0.

Hi @gumang We have tried a few things and overall I think it has improved matters but its hard to tell for sure but I don’t think we have seen this issue for awhile now.
So one of the things we have done is to always make call activities async after.
And the other thing was to override the AsyncContinuationJobHandler

public class CustomAsyncContinuationJobHandler extends AsyncContinuationJobHandler {

  private static final Logger LOG = LoggerFactory.getLogger(CustomAsyncContinuationJobHandler.class);

  // Use cache bounded in size to avoid memory leaks.
  private Cache<String, Boolean> retriedExecutionId = CacheBuilder.newBuilder().maximumSize(50).build();


  @Override
  public void execute(AsyncContinuationConfiguration configuration, ExecutionEntity execution, CommandContext commandContext, String tenantId) {
    String retryKey = execution.getId() + "_" + execution.getActivityId();
    try {
      super.execute(configuration, execution, commandContext, tenantId);
      
      if (retriedExecutionId.getIfPresent(retryKey) != null) {
        retriedExecutionId.invalidate(retryKey);
        LOG.warn("Successful retry of Legacy Behaviour activity '{}' and execution '{}'",
            execution.getActivityId(),
            execution.getId());
      }
    } catch (NullPointerException npe) {
      //If the null pointer is a legacy async exception we want to log it and retry
      if (npe.getStackTrace()[0].getClassName().equals(LegacyBehavior.class.getName())) {
        
        if (retriedExecutionId.getIfPresent(retryKey) == null) {
          LOG.warn("Legacy Behaviour Exception caught, retrying. Running Async Job on activity '{}' and execution '{}'", 
              execution.getActivityId(),
              execution.getId());
          retriedExecutionId.put(retryKey, Boolean.TRUE);
          throw new RetriableJobException(npe.getMessage());
          
        } else {
          LOG.warn("Legacy Behaviour Exception caught after retry. Running Async Job on activity '{}' and execution '{}'",
              execution.getActivityId(),
              execution.getId());
        }
      }

      throw npe;
    }
  }
}

And a custom job retry cmd

public class CustomJobRetryCmd extends DefaultJobRetryCmd {
  
  public CustomJobRetryCmd(String jobId, Throwable exception) {
    super(jobId, exception);
  }

  @Override
  protected boolean shouldDecrementRetriesFor(Throwable t) {
    return super.shouldDecrementRetriesFor(t) && !(t instanceof RetriableJobException);
  }

}

And a custom failed FailedJobCommandFactory

public class CustomFailedJobCommandFactory implements FailedJobCommandFactory {

  @Override
  public Command<Object> getCommand(String jobId, Throwable exception) {
    return new CustomJobRetryCmd(jobId, exception);
  }

}

And to set it up

config.setFailedJobCommandFactory(new CustomFailedJobCommandFactory());
config.getCustomJobHandlers().add(new CustomAsyncContinuationJobHandler());

The RetriableJobException is our own defined Exception

Hope that helps

Thanks @matt, I will try this

This issue is very wierd…i have implemented AsyncContinuationJobHandler too. Still no luck. This issue pops up randomly.
@matt @Philipp_Ossler please can u help

Hello @gumang,

Could you please create a minimal example that reproduces the issue (JUnit or spring boot project).

Best regards,
Yana

Its tough to create an example for this. I have shared the bpmns and other details in the below link

One more observation is that when these async points exceptions are seen, for those exceptions, act_id in ACT_RU_EXECUTION table is also empty and data is there in act_inst_id.
May be if we figure out when this happen, it can give some clue