Error on async after

matt · November 2, 2018, 10:24am

Hi all,

I’m currently experiencing an error that only occurs sometimes and haven’t been able to replicate outside of our full flow diagram. Essentially we are getting the error below which causes the process to stop for awhile (noticed stops between 30mins to 1hr) and then the process seems to continue with no errors and no external intervention to get it going again.

java.lang.NullPointerException: null
at org.camunda.bpm.engine.impl.pvm.runtime.LegacyBehavior.isAsync(LegacyBehavior.java:537) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.pvm.runtime.LegacyBehavior.repairMultiInstanceAsyncJob(LegacyBehavior.java:566) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.AsyncContinuationJobHandler.execute(AsyncContinuationJobHandler.java:63) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.AsyncContinuationJobHandler.execute(AsyncContinuationJobHandler.java:36) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.persistence.entity.JobEntity.execute(JobEntity.java:132) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.cmd.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:99) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.cmd.ExecuteJobsCmd.execute(ExecuteJobsCmd.java:36) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.CommandExecutorImpl.execute(CommandExecutorImpl.java:24) ~[camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:104) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:42) [camunda-engine-spring-7.9.0.jar!/:7.9.0]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) [spring-tx-5.0.6.RELEASE.jar!/:5.0.6.RELEASE]
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:40) [camunda-engine-spring-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:66) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:36) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:29) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.executeJob(ExecuteJobsRunnable.java:88) [camunda-engine-7.9.0.jar!/:7.9.0]
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.run(ExecuteJobsRunnable.java:57) [camunda-engine-7.9.0.jar!/:7.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

Below is a stripped down version of the process and the error occurs on the Ack Received message event. (All of the purple element have async after set on them)

Any ideas on why this would be happening?

Thanks,

Matt

Philipp_Ossler · November 12, 2018, 8:48am

Hi @matt,

I don’t know why this NPE happens. Can you provide your example as failing test case? It would help to analyze the issue.

If this issue blocks you then you could replace the AsyncContinuationJobHandler job handler by a custom one.

Best regards,
Philipp

matt · November 12, 2018, 9:07am

Hi @Philipp_Ossler,

Thanks for the reply, we will try to create an example test case, but we are unsure of the scenario in which this error happens at the moment.

gumang · January 15, 2019, 7:17am

Hi @matt

I am facing same exception. Any luck with solution?

For me, issue happens only for some process out of 1000. and when I start workflow instances in bulk

gumang · August 12, 2019, 6:20am

HI @Philipp_Ossler, I am facing exact same exception, it is happening randomly at two-three particular service tasks after sub-process. I am using camunda 7.9.0.

matt · August 12, 2019, 7:10am

Hi @gumang We have tried a few things and overall I think it has improved matters but its hard to tell for sure but I don’t think we have seen this issue for awhile now.
So one of the things we have done is to always make call activities async after.
And the other thing was to override the AsyncContinuationJobHandler

public class CustomAsyncContinuationJobHandler extends AsyncContinuationJobHandler {

  private static final Logger LOG = LoggerFactory.getLogger(CustomAsyncContinuationJobHandler.class);

  // Use cache bounded in size to avoid memory leaks.
  private Cache<String, Boolean> retriedExecutionId = CacheBuilder.newBuilder().maximumSize(50).build();


  @Override
  public void execute(AsyncContinuationConfiguration configuration, ExecutionEntity execution, CommandContext commandContext, String tenantId) {
    String retryKey = execution.getId() + "_" + execution.getActivityId();
    try {
      super.execute(configuration, execution, commandContext, tenantId);
      
      if (retriedExecutionId.getIfPresent(retryKey) != null) {
        retriedExecutionId.invalidate(retryKey);
        LOG.warn("Successful retry of Legacy Behaviour activity '{}' and execution '{}'",
            execution.getActivityId(),
            execution.getId());
      }
    } catch (NullPointerException npe) {
      //If the null pointer is a legacy async exception we want to log it and retry
      if (npe.getStackTrace()[0].getClassName().equals(LegacyBehavior.class.getName())) {
        
        if (retriedExecutionId.getIfPresent(retryKey) == null) {
          LOG.warn("Legacy Behaviour Exception caught, retrying. Running Async Job on activity '{}' and execution '{}'", 
              execution.getActivityId(),
              execution.getId());
          retriedExecutionId.put(retryKey, Boolean.TRUE);
          throw new RetriableJobException(npe.getMessage());
          
        } else {
          LOG.warn("Legacy Behaviour Exception caught after retry. Running Async Job on activity '{}' and execution '{}'",
              execution.getActivityId(),
              execution.getId());
        }
      }

      throw npe;
    }
  }
}

And a custom job retry cmd

public class CustomJobRetryCmd extends DefaultJobRetryCmd {
  
  public CustomJobRetryCmd(String jobId, Throwable exception) {
    super(jobId, exception);
  }

  @Override
  protected boolean shouldDecrementRetriesFor(Throwable t) {
    return super.shouldDecrementRetriesFor(t) && !(t instanceof RetriableJobException);
  }

}

And a custom failed FailedJobCommandFactory

public class CustomFailedJobCommandFactory implements FailedJobCommandFactory {

  @Override
  public Command<Object> getCommand(String jobId, Throwable exception) {
    return new CustomJobRetryCmd(jobId, exception);
  }

}

And to set it up

config.setFailedJobCommandFactory(new CustomFailedJobCommandFactory());
config.getCustomJobHandlers().add(new CustomAsyncContinuationJobHandler());

The RetriableJobException is our own defined Exception

Hope that helps

gumang · August 13, 2019, 5:43am

Thanks @matt, I will try this

gumang · May 31, 2020, 5:32am

This issue is very wierd…i have implemented AsyncContinuationJobHandler too. Still no luck. This issue pops up randomly.
@matt @Philipp_Ossler please can u help

Yana · June 17, 2020, 3:05pm

Hello @gumang,

Could you please create a minimal example that reproduces the issue (JUnit or spring boot project).

Best regards,
Yana

gumang · June 18, 2020, 4:17am

Its tough to create an example for this. I have shared the bpmns and other details in the below link

gumang · June 18, 2020, 4:19am

One more observation is that when these async points exceptions are seen, for those exceptions, act_id in ACT_RU_EXECUTION table is also empty and data is there in act_inst_id.
May be if we figure out when this happen, it can give some clue

system · January 30, 2024, 11:25am