-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Current Behavior
When hosting Actix behind Traefik or Envoy, if a POST request is made with invalid data, or invalid headers, Actix does not close the channel from Traefik/Envoy's perspective, but stops polling it, so next request that comes in will never get an answer from Actix.
Using curl directly against actix with the same incorrect body, does not seem to reproduce the issue.
Expected Behavior
I would not expect a previous request to cause new requests to experience timeouts, even if there's a loadbalancer in front of Actix
Possible Solution
Our workaround is putting nginx in our docker container to interface with traefik/envoy and then forward requests to actix using proxy_pass, this has removed our 499s, but adds more cpu load to the nodes running the docker container.
Steps to Reproduce (for bugs)
https://github.com/chriswk/actix-web-lb-connection
To reproduce:
Follow the readme, I've narrowed the actix application down to the absolute minimum here, no extra middlewares, no custom code. This is what makes me think this is deep in actix internals.
Context
We first noticed this problem when deploying our actix app to an AWS App runner (Uses envoy for loadbalancing). All metrics on our side looked fine, but our customer experienced client timeouts. Later we saw the same issue when deploying the same app inside our k8s cluster which uses Traefik 2.11.
Using haproxy or nginx or connecting directly to the actix app does not seem to surface the same problem.
As an extra aside here, I could not reproduce it with the same minimum example, just using axum instead of actix-web, not matter which load-balancer I put it behind. But since the app we're actually running has grown rather large, I would really like to not have to migrate.
Your Environment
- Rust version: 1.88 (though we've experienced this on 1.85 and 1.81 as well, in both arm and amd64 flavours)
- Actix Web Version: 4.11 (though reproducible in 4.8 and 4.9 as well)
- Traefik: 3.5 / 2.11 (both exhibit same behaviour)
- Amazon App Runner (uses envoy behind the scenes)