Issue with LDAP connectivity after upgrading to 7.10


#1

Hi Folks,

Just want to check if anyone has similar issues. After switching to 7.10 from 7.9 we have ongoing issues with users being kicked out from tasklist with the following error:

image

However, we have found that it is happening not with all environments. We were able to trace back this issue to specific LDAP server.

So, did anything like LDAP timeout changed from 7.9 to 7.10? In 7.9 we did not observe those issues with LDAP.

Best regards,
Ilya


#2

Hi Ilya,
I have a similar behaviour in a test environment using a custom identity plugin. On refresh of task list, the error disappears. Thus its not restricted to LDAP. Its as if there is a race condition between authentication and tasklist UI. It does not occur on the native DB identity service…

Until your report, I was thinking it was my custom plugin…

regards

Rob


#3

Hi Rob,

No issues with native DB authentification as well. However, issue is not disappearing when refreshing. Also, after logging in users don’t have a fully qualified name in an upper right corner - only login near home button.

We conducted an experiment early today - switched 4 developers from a “slow” LDAP server to a more powerful one. For two developers problem was fixed and for two others not. I am a bit lost

Best regards,
Ilya


#4

Hi @Webcyberrob!

We did additional analysis for this issue today. So the issue is solely with a tasklist application. If you log in through cockpit or a welcome screen - you would be able to login to tasklist afterwards. However, users cannot login directly to a tasklist. I believe that tasklist has an aggressive tolerance limit / latency threshold configured in 7.10.

Could anyone from Camunda team please comment on:

  • How to change LDAP response latency threshold in web apps, especially tasklist?
  • What has changed in LDAP config since 7.9?

Thank you in advance

Best regards,
Ilya


#5

Hi Ilya,

We didn’t change LDAP configuration since 7.9.
When the problem occurs could you please collect some traces:

  • console log from dev tools
  • complete stack trace of an exception if there’s any in server log file

Best regards,
Yana


#6

Hi Yana,

Further to Ilya’s comments, I have attached:

Stack trace from server log:
localhost.2019-01-28.log.txt (27.3 KB)

Console log from dev tools:
camunda-dv.pfizer.com-1548686668257.log (30.3 KB)

Do these provide any information that might explain the cause of the issue?

We only face this issue on one development server which is located in a different region from the Camunda host. Other servers which are located in the same region as the Camunda host do not have this issue. All of the Camunda servers are otherwise configured in the same way for LDAP basic authentication.

Kind regards,
Anthony


#7

Hello guys,

I check the provided traces and I have an assumption what could be the problem.
In the console log file I found only two requests for the task list:

	Line 1: deps.js?bust=7.10.0:92721 POST https://camunda-dv.pfizer.com:8443/camunda/api/engine/engine/default/filter/4180017e-f875-11e8-9de9-029163f3453c/list?firstResult=0&maxResults=15 500
Line 619: deps.js?bust=7.10.0:92721 POST https://camunda-dv.pfizer.com:8443/camunda/api/engine/engine/default/filter/4180017e-f875-11e8-9de9-029163f3453c/list?firstResult=0&maxResults=15 401

The only exceptions which are in the server log file are:

java.lang.IllegalStateException: Cannot create a session after the response has been committed
at org.apache.catalina.connector.Request.doGetSession(Request.java:2974)
at org.apache.catalina.connector.Request.getSession(Request.java:2416)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:908)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:920)
at org.camunda.bpm.webapp.impl.security.auth.AuthenticationFilter.doFilter(AuthenticationFilter.java:67)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:668)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:770)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

This seems to be a known issue related to invalid CSRF token - https://app.camunda.com/jira/browse/CAM-9589
However, I am just guessing because there is no timestamp in the console log file which could point that the 500 response is related to one of the IllegalStateException which we see in the server log file.

Best regards,
Yana