Random 502 Bad Gateway

bchastanier · March 28, 2024, 2:51pm

Thanks for the heads up.
It’s very interesting

What we can do to validate it can fixes the issue is to manually edit the nginx config map on your cluster, settings those two fields:

retry-non-idempotent: "true"
proxy-next-upstream: "error timeout http_502"

Once set on your cluster, you shouldn’t redeploy the cluster unless it will erase those configs.

We can let it run couple hours or days so you can check if it solves your issue.

If it works, then, we can add those settings in the product directly so you can customize it.

How does it sound?

Cheers

Mathieu_Haage · March 28, 2024, 8:58pm

Thank you for this suggestion, it sounds great !
I can’t wait to see if these settings solve the problem.
Running the script for a couple of days is more appropriate, as some scripts are run once a day.

If these parameters solve the problem, it’ll be very useful to be able to customize it indeed.

bchastanier · March 28, 2024, 9:00pm

Ok ! So you just let me know when you want me to override those settings on your staging cluster.

Mathieu_Haage · March 28, 2024, 9:03pm

As soon as possible.

I just found a more precise info (easier now I know what I’m looking for):

Starting in nginx 1.9.13, non-idempotent requests (PUT , POST , etc) are not retried by default.

Now I understand why the GET requests respond with a 200 and return data when the recv() failed happens, but POST requests respond with a 502.

This seems to be the best lead since a week, at last it makes sense.

bchastanier · March 28, 2024, 10:02pm

I’ve just updated your staging cluster nginx config with the values above mentioned.

I also locked your cluster so no clusters updates (Qovery initiated nor by you) will be possible during the test. We will remove the lock afterwards.

Let me know how it goes

Mathieu_Haage · March 28, 2024, 10:35pm

Thank you. Now we wait.
I’ll keep you up to date.

Mathieu_Haage · March 29, 2024, 10:56am

Hi @bchastanier,

Sadly the new settings didn’t change anything. So you can revert to the previous config now.

I’ll be off for 2 weeks, so I won’t work too much on this problem. We have a retry mechanism that handles the error for now.
I’ll pick it up when I get back.

While I’m away I’ll try to figure out what could be causing the error “recv() failed - connection reset by peer”. Maybe there’s a node/nestjs specificity I don’t know about.

Once again, a big thank you for your help.

Cheers

bchastanier · March 29, 2024, 10:58am

Ok !

Yes indeed, now it seems that the issue comes from your application indeed.

Again, let me know once you found the solution

system · April 5, 2024, 10:58am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nginx-ingress random error : recv() failed (104: Connection reset by peer) Questions and Answers kubernetes	4	53	April 26, 2024
Service is down - Error CrashLoopBackOff Deployment qovery	4	262	March 25, 2024
Changing nginx proxy values proxy_connect_timeout_seconds, read, send has no effect Questions and Answers	5	198	September 14, 2023
Site with a status code 503 during deployments Questions and Answers	6	518	July 19, 2022
NGINX application error after/during deployement Questions and Answers	6	149	March 25, 2024

Random 502 Bad Gateway

Related Topics