[Keycloak] Opening Tasklist crashes Camunda Run platform

Hi,

I am running

Camunda Platform 7.14 run
on Kubernetes
with Keycloak Plugin 2.0.0

We started creating User Tasks in ower process model, so we use the tasklist now for the first time in production. But when we open the task list, the list remains empty with an endless spinning “Loading” wheel. At some point, an error occurs, stating the request as been terminated Possible causes: the network is offline, Origin is not allowed by Access-Control-Allow-Origin, the page is being unloaded, etc.:
image

From the moment we try to view the task list, the UI slowly breaks. First the menubar at the top of the ui disappears.

This is all weird. The main difference between dev-setup and the prod setup is user authorization. Dev setup has none, while prod uses keycloak.

Would I maybe need to configure some authoriation in Admin area? Or is it maybe something we miss in Keycloak setup? We regularly get logs like this, but up to now everything still worked fine:

2021-07-12T13:50:38.236270864Z 2021-07-12 13:50:38.235 ERROR 11 --- [io-8080-exec-13] org.camunda.bpm.extension.keycloak       : KEYCLOAK-01012 TOKEN refresh failed: 400 Bad Request: [{"error":"invalid_grant","error_description":"Invalid refresh token"}]

Do you have a suggestion what I could check or validate? I have no lever on this one.

Regards,
Markus

This is a timely post for me as I’m also trying to put Keycloak in front of Camunda Platform for authorization and access control. You are much farther along than I am :slight_smile:

That being said, it does look like there is a mis-configuration in the Keycloak Token Refresh setup. Have you gone through the keycloak documentation on token exchange yet to see if it is any help?

Best Regards,
dg

1 Like

By now I upgraded to:

Camunda Platform Run 7.15.0
KeycloackIdentifyProviderPlugin 2.1.0

I am not sure anymore that the refresh token thing has something to do with this. I can now recreate the error:
everything is fine, until I claim my first task. The UI does not respond to the claim, but in database, I can verify that I am the assigne. Also in camunda cockpit I am the assignee:
image

Also in Task Authorization, I can see that I have ALL rights to the task.

But I cannot work on the task (it is an embedded form). From that time on, opening the task list crashes the database access after 30 seconds:

2021-07-14T19:17:46.097265933Z org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
2021-07-14T19:17:46.097270122Z 	at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:309) ~[spring-jdbc-5.3.4.jar!/:5.3.4]
2021-07-14T19:17:46.097274053Z 	at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400) ~[spring-tx-5.3.4.jar!/:5.3.4]
2021-07-14T19:17:46.097277851Z 	at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373) ~[spring-tx-5.3.4.jar!/:5.3.4]
2021-07-14T19:17:46.097281325Z 	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137) ~[spring-tx-5.3.4.jar!/:5.3.4]
2021-07-14T19:17:46.097284797Z 	at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:70) ~[camunda-engine-spring-7.15.0.jar!/:7.15.0]
2021-07-14T19:17:46.097288112Z 	at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:70) ~[camunda-engine-7.15.0.jar!/:7.15.0]
2021-07-14T19:17:46.097291475Z 	at org.camunda.bpm.engine.impl.interceptor.CommandCounterInterceptor.execute(CommandCounterInterceptor.java:35) ~[camunda-engine-7.15.0.jar!/:7.15.0]
2021-07-14T19:17:46.097294805Z 	at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:33) ~[camunda-engine-7.15.0.jar!/:7.15.0]
2021-07-14T19:17:46.097310237Z 	at org.camunda.bpm.engine.impl.IdentityServiceImpl.isReadOnly(IdentityServiceImpl.java:85) ~[camunda-engine-7.15.0.jar!/:7.15.0]
2021-07-14T19:17:46.097314066Z 	at org.camunda.bpm.webapp.impl.engine.ProcessEnginesFilter.needsInitialUser(ProcessEnginesFilter.java:247) ~[camunda-webapp-7.15.0-classes.jar:7.15.0]
2021-07-14T19:17:46.097318850Z 	at org.camunda.bpm.webapp.impl.engine.ProcessEnginesFilter.serveIndexPage(ProcessEnginesFilter.java:175) ~[camunda-webapp-7.15.0-classes.jar:7.15.0]
2021-07-14T19:17:46.097322545Z 	at org.camunda.bpm.webapp.impl.engine.ProcessEnginesFilter.applyFilter(ProcessEnginesFilter.java:127) ~[camunda-webapp-7.15.0-classes.jar:7.15.0]
2021-07-14T19:17:46.097325836Z 	at org.camunda.bpm.webapp.impl.filter.AbstractTemplateFilter.doFilter(AbstractTemplateFilter.java:58) ~[camunda-webapp-7.15.0-classes.jar:7.15.0]
2021-07-14T19:17:46.097329121Z 	at org.camunda.bpm.spring.boot.starter.webapp.filter.LazyDelegateFilter.doFilter(LazyDelegateFilter.java:60) ~[camunda-bpm-spring-boot-starter-webapp-core-7.15.0.jar:7.15.0]

After a few minutes, camunda platform recovers, but as soon as I open the task list, the db connection crashes again.

If I remove the assigne from the task (either through direct DB access or cockpit), I can again open the complete tasklist. But as soon as I claim a task, weirdness starts all over.

And this only happens, when I have the KeycloakPlugin installed. It works fine without it. This is so confusing :confused:

Regards,
Markus

Cross posting the corresponding issue on GitHub.

We spend now several days with help of @VonDerBeck . We cannot narrow down if it is an infrastructure issue, or Camunda, or Keycloack, or KeycloakPlugin or if we somehow have an unusual LDAP structure.

Suspicious

Things we find curious:

[Camunda] Big group-query

KeycloackPlugin or Camunda trigger a huge group-query when opening a task list, querying basically every group that is available in this log line:

2021-07-21 12:51:16.970 DEBUG 9 --- [nio-8080-exec-8] org.camunda.bpm.extension.keycloak       : KEYCLOAK-01050 Keycloak group query results:

The query is huge, because we have roughly 800 groups.

We cannot imagine, why all have to be queried, even though there are only 2 tasks available :confused:

[Keycloak] Querying all users via REST is slow, too

Using the camunda client id for getting all users from Keycloak REST-API is very slow, too. The queries are infact slow. But after days we cannot figure out what configuration is responsible or at least uncommon. :disappointed:

Tweaks :hammer_and_wrench:

In case somebody runs in to a similar problem. those action showed improvements:

[Camunda] Enhance Hakari config :point_left:

:exclamation: This delays the crashes since more db connections might timeout eventually :exclamation:

Camunda seems to keep an open db transaction for the duration of its keycloak-queries. When queries are slow the db connections block the complete database. Adding the following to your default.yml or production.yml helps:

spring.datasource:
    # see https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby
    hikari:
      connectionTimeout: 30000
      idleTimeout: 600000
      maxLifetime: 900000
      #Hikari 4.0
      keepaliveTime: 60000

      # see https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby (minimumIdle)
      # and https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing
      minimumIdle: 10
      maximumPoolSize: 20

[Camunda] Keycloak Plugin>=2.1.1 required :point_left:

This improves performance of UI view “Tasklist” and "Admin → List all Users"

In our experiments we always required the properties. Because caching is only available in version 2.1.1 onwards, you do require the current snapshot in order to set cacheEnabled:

# Camunda Keycloak Identity Provider Plugin
plugin.identity.keycloak:
  cacheEnabled: true
  authorizationCheckEnabled: false

At date of this posting, the version 2.1.1 is only availabe as snapshot.

[Keycloak] Limit amount of users in User Federation :point_left:

This improves performance of UI view “Tasklist” and "Admin → List all Users"

I left the company realm and created a realm specific for Camunda. I narrowed down the LDAP Tree which will be synced to only people who might possibly work with Cmaunda. Now I have only 125 users and 440 groups (groups have recurions). That improved query performance.

[Keycloak] More performance for Keycloak DB 🤷

We increased cluster resources for Keycloak Postgres DB to 1 cpu and 2 Gib . But we did not witness any effect.

Thanks so much for your investigations and especially for sharing your findings.

I’d like to add one thing to your findings: the Keycloak Identity Provider Plugin never queries all users or all groups configured in Keycloak. There is a maxResultSize parameter with a default value of 100 exactly for preventing side effects of such queries.
I have seen a Keycloak installation with more than 10.000 users and still the user query returned in acceptable time. I suspect that the cause is to be found in Keycloak itself and its configuration.

However, this is a good motivation to make the new cache feature release-ready. The current build process is unfortunately still having problems (and the resulting SNAPSHOT artifact in Nexus corrupt), but that will hopefully be solved soon.

2 Likes

This is a really informative read - thanks @Noordsestern and @VonDerBeck :slight_smile:
I’ll try to find someone who might be able to look into some of the points you mentioned.