Sorry for getting back to you so late, I’ll try to answer sooner next time.
The process, lets call it A, is made up of one service task and receiving message events. Some message events are connected via event gateway. The process will be expanded in the future, but for now only these elements.
Another process B is sending process A messages from time to time to advance the process. For now, the communication is unidirectional, i.e. only from B to A.
We have checked the process defintions but to best of our knowledge, both processes are well defined. Process B has been runnning for quite some time and process A is quite simple and easy to check.
We implement our services with Java classes or use external tasks with microservices.
Our history level is FULL.
We run 7.12. Our setup encompasses two instances, a production and a staging system, both live in a docker container each. We use nginx as reverse proxy. We use postgresql databses. Every night we shutdown both systems, copy the camunda database from the production to the staging and reboot the systems. Due to our suboptimal setup, we also have to reboot the systems when we deploy new processes.
Meanwhile, we got ourselfs some catalina logs. Unfortunately, due to our suboptimal setup, we lost all older logs, so we can only work with some logs from the incident…We are working with our IT department to rectify that, but…well…
I attached the logs to this post.
Now it looks to me as if the database is trying to do a batch operation to insert 5 execution entities and one of the 5 jobs is failing, and the database is rolled back. And somehow this deleted the process instances. But, I’m totally lost as to how.
Interestingly, the referenced task “Task_13rld24” is a sub-process of process B, the tasks which sends messages to process A, which instances have gone missing. We can’t find the task instances of Task_13rld24 in the history database, nor the execution-environments of these tasks.
So there might be exceptions here, but in the end, we are not even sure, if these exceptions are related to our missing process instances. And if they are, how?
So far, we disconnected the sending message events from process B and have stopped using process A completely, as we don’t understand, what is happening.
catalina logs mod.txt (89.7 KB)