OOM kills on a number of system containers

Kyle_Flavin · March 27, 2024, 11:15pm

We’ve been seeing a number of Qovery managed services being OOM killed lately in our cluster such as the HPA, VPA, among others, which is concerning. Our events from the past couple days:

Process `cainjector` (pid: 34703) triggered an OOM kill on itself. The process had reached 253584 pages in size.

This OOM kill was invoked by a cgroup, containerID: ea47d570c68ab45d6c63a987c68dd02e815bda1cb5b2fdeda0d93e2842257bcf.

Process `cluster-autosca` (pid: 2859099) triggered an OOM kill on itself. The process had reached 78984 pages in size.

This OOM kill was invoked by a cgroup, containerID: 1201bcd3911c43061591e8dc5cc9bda36d10006c6e5f0e9d76d0b53b8a0de555.

Process `updater` (pid: 4031477) triggered an OOM kill on itself. The process had reached 55025 pages in size.

This OOM kill was invoked by a cgroup, containerID: 18171e8e7e17277173d9150fc369a82bba1b52b0006549084e2d8f74c8e0e9c7.

Process `recommender` (pid: 106332) triggered an OOM kill on itself. The process had reached 44689 pages in size.

This OOM kill was invoked by a cgroup, containerID: 9a08c2879617508157789a04d9f7c9181d15a89580a96607b131b9cadc1b31cb.

Do we need to scale these up? Is it safe for us to modify them directly? Or can that be done through the UI? I don’t see any relevant looking settings.

Kyle

bchastanier · April 16, 2024, 8:05am

Hello @Kyle_Flavin,

Just so you know, we are working on improving it, most of those are under VPA umbrella and will scale up if needed, so there shouldn’t have any issue on that front, service might get killed eventually but will come back with more resources.
A bit of downtime on those containers is ok and shouldn’t lead to any further issues once back.

Cheers

Kyle_Flavin · April 19, 2024, 5:02pm

Got it. Thanks @bchastanier

Topic		Replies	Views
OOM error for application Questions and Answers	2	82	January 29, 2024
[INCIDENT CLOSED] Qovery internal services outage News	5	364	July 7, 2023
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory Questions and Answers	4	259	March 25, 2024
Delete error when AWS Resources are deleted from aws console Deployment	8	455	March 25, 2024
Started pods not same quantity as reflected in qovery UI Deployment	3	208	March 25, 2024

OOM kills on a number of system containers

Related Topics