OOM kills on a number of system containers

We’ve been seeing a number of Qovery managed services being OOM killed lately in our cluster such as the HPA, VPA, among others, which is concerning. Our events from the past couple days:

Process `cainjector` (pid: 34703) triggered an OOM kill on itself. The process had reached 253584 pages in size.

This OOM kill was invoked by a cgroup, containerID: ea47d570c68ab45d6c63a987c68dd02e815bda1cb5b2fdeda0d93e2842257bcf.

Process `cluster-autosca` (pid: 2859099) triggered an OOM kill on itself. The process had reached 78984 pages in size.

This OOM kill was invoked by a cgroup, containerID: 1201bcd3911c43061591e8dc5cc9bda36d10006c6e5f0e9d76d0b53b8a0de555.

Process `updater` (pid: 4031477) triggered an OOM kill on itself. The process had reached 55025 pages in size.

This OOM kill was invoked by a cgroup, containerID: 18171e8e7e17277173d9150fc369a82bba1b52b0006549084e2d8f74c8e0e9c7.

Process `recommender` (pid: 106332) triggered an OOM kill on itself. The process had reached 44689 pages in size.

This OOM kill was invoked by a cgroup, containerID: 9a08c2879617508157789a04d9f7c9181d15a89580a96607b131b9cadc1b31cb.

Do we need to scale these up? Is it safe for us to modify them directly? Or can that be done through the UI? I don’t see any relevant looking settings.

Kyle

Hello @Kyle_Flavin,

Just so you know, we are working on improving it, most of those are under VPA umbrella and will scale up if needed, so there shouldn’t have any issue on that front, service might get killed eventually but will come back with more resources.
A bit of downtime on those containers is ok and shouldn’t lead to any further issues once back.

Cheers

Got it. Thanks @bchastanier