Skip to content

Client Connection timeouts when running behind loadbalancers #3715

@chriswk

Description

@chriswk

Current Behavior

When hosting Actix behind Traefik or Envoy, if a POST request is made with invalid data, or invalid headers, Actix does not close the channel from Traefik/Envoy's perspective, but stops polling it, so next request that comes in will never get an answer from Actix.

Using curl directly against actix with the same incorrect body, does not seem to reproduce the issue.

Expected Behavior

I would not expect a previous request to cause new requests to experience timeouts, even if there's a loadbalancer in front of Actix

Possible Solution

Our workaround is putting nginx in our docker container to interface with traefik/envoy and then forward requests to actix using proxy_pass, this has removed our 499s, but adds more cpu load to the nodes running the docker container.

Steps to Reproduce (for bugs)

https://github.com/chriswk/actix-web-lb-connection
To reproduce:
Follow the readme, I've narrowed the actix application down to the absolute minimum here, no extra middlewares, no custom code. This is what makes me think this is deep in actix internals.

Context

We first noticed this problem when deploying our actix app to an AWS App runner (Uses envoy for loadbalancing). All metrics on our side looked fine, but our customer experienced client timeouts. Later we saw the same issue when deploying the same app inside our k8s cluster which uses Traefik 2.11.

Using haproxy or nginx or connecting directly to the actix app does not seem to surface the same problem.

As an extra aside here, I could not reproduce it with the same minimum example, just using axum instead of actix-web, not matter which load-balancer I put it behind. But since the app we're actually running has grown rather large, I would really like to not have to migrate.

Your Environment

  • Rust version: 1.88 (though we've experienced this on 1.85 and 1.81 as well, in both arm and amd64 flavours)
  • Actix Web Version: 4.11 (though reproducible in 4.8 and 4.9 as well)
  • Traefik: 3.5 / 2.11 (both exhibit same behaviour)
  • Amazon App Runner (uses envoy behind the scenes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-httpproject: actix-httpC-bugCategory: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions