feat: low latency threads #1871

AlliBalliBaba · 2025-09-11T22:21:36Z

This PR is inspired by issues like #1821 and #1796

It's not uncommon for servers to experience latency spikes due to external resources getting slow. This can potentially cause all resources to be consumed waiting for these slow requests to complete. In thread-pool based servers this usually comes in the form of all threads getting stuck and the CPU going idle. While there are techniques to avoid this, it's not always trivial or apparent how to do so.

This PR will automatically detect slow requests (based on path) and prevent them from dispatching to a selected number of threads. While this will not get the slow requests unstuck, it will allow low-latency requests to still be responsive, greatly increasing the servers resilience (in theory).

wip

henderkes · 2025-09-11T23:55:35Z

I think it's a really cool solution, but I'm not sure if it really falls in the responsibility of FrankenPHP, rather than the user. Does FrankenPHP currently diverge from Caddy + FPM when it comes to scenarios like this?

It's a bit like selling a solution to things a user shouldn't be doing in the first place.

dunglas · 2025-09-12T09:38:45Z

I think it's a smart solution.

AlliBalliBaba · 2025-09-12T10:36:01Z

I'm actually not sure if FPM does something similar.

This isn't really a solution to slow requests, nor do I think that FrankenPHP can provide one for every situation.

It's more about making the server resilient if worst comes to worst. The moment you realize that a service can randomly time out on you, it's usually already too late.

The logic still needs some refining though

henderkes · 2025-09-12T12:26:26Z

I can think of an alternative approach that swoole uses, where every IO call is hooked and the coroutine is suspended until the IO returns. That way a user could have 1000000 requests waiting on a http response, or a file, but the webserver would be able to count those "threads" as free and handle new requests with them.

I'm not sure how feasible that is to achieve without hooking into the php runtime through an extension, though.

withinboredom · 2025-09-12T14:18:04Z

Interesting solution! I'm not sure this is an actual problem though. With thread autoscaling and the fact that caddy will hold requests until a thread is actually available, I'm not sure we're actually solving a real problem (in both those issues, afaict, thread autoscaling was not being utilized).

AlliBalliBaba · 2025-09-12T16:15:28Z

@henderkes what you are talking about are async workers, which I also plan to add separately, once async is native to PHP (waiting for this). Async runtimes are a bit more restricting on the application, but can handle slow IO more efficiently.

@withinboredom autoscaling only works until all threads are consumed, this would keep the server going even afterwards.

withinboredom · 2025-09-12T18:08:13Z

If all threads are consumed, should it keep serving traffic? I feel like this would just mask issues. For example, it may continue to respond to /healthz saying everything is fine when in fact, nearly every request going to /pay/me/money is getting hung.

Something like this probably requires a ton of observability and monitoring tools that will be application-specific. The proper solution is probably to kick whatever slow thing to a queue instead of trying to handle it.

AlliBalliBaba · 2025-09-13T07:47:01Z

I agree that slow things belong on queues, but I disagree that this would reduce observability. Going down that logic, every minor issue with any route should make the server crash and health check fail, which is something most people probably don't prefer.

You might even argue this improves observability since it will only make the routes fail that are actually getting stuck, making it easier to debug.

withinboredom · 2025-09-13T08:28:46Z

I appreciate your take on this, no one wants the whole server to crash on a single failure. I think our difference is in how we want the system to surface issues and what is most actionable for ops to handle.

My worry is that this creates a false-negative: you could have 90% of your business logic stalled out, but all your dashboards show perfect health. This risks hiding real issues from the people who need to act on them. We need to make sure we surface metrics that show requests being partitioned and document what they mean and how to handle them. Otherwise, ops teams will be flying blind.

As things are now, as performance starts to degrade, health checks will also start to get more latency. This is a signal that things aren’t going well. With this change, health checks probably will not degrade, so we need to monitor these new metrics as well and be able to alert on them somehow.

Currently, at some point, all threads will be used and health checks will fail, removing the instance from a load balancer pool. This will prevent the instance from serving new requests but allow requests to still be (eventually) completed. This is pretty standard ops in things like kubernetes or other control planes. Adding this feature would make this kind of thing nearly impossible and invisible to control planes, preventing degraded servers from being rotated out.

In other words, I think this is a Band-Aid on applications. There are proper ways to solve it that I think we both agree on: queues.

If you do have slow requests that queues can’t handle, it is pretty straightforward to create a pool that handles just these slow routes without affecting any others. In which case, these can be monitored separately with their own set of alerts.

AlliBalliBaba · 2025-09-13T13:49:03Z

Kubernetes will usually scale based on different predefined metrics. The health check timing out will just cause containers to restart over and over (bad time, been there). This PR will keep the server healthy and just time out the slow endpoints, similar to a connection pool that's overburdened.

I'm fine with making it opt-in though (maybe in combination with max_wait_tine?)

If you do have slow requests that queues can’t handle, it is pretty straightforward to create a pool that handles just these slow routes without affecting any others. In which case, these can be monitored separately with their own set of alerts.

Splitting the pool is what this PR attempts to do, but automatically.

withinboredom · 2025-09-13T14:04:36Z

The health check timing out will just cause containers to restart over and over (bad time, been there).

Hmmm. I’m not sure what you mean. k8s have both liveness and readiness probes. Liveness probes will restart the container, but readiness will just remove it from the ingress. Basically, the pattern is set lower readiness timeouts than liveness to give a container a way to get into a healthy state on its own without killing it (and potentially losing work).

This PR will keep the server healthy and just time out the slow endpoints, similar to a connection pool that's overburdened.

I’m still not convinced that it is safe to do so. There would need to be lots of visibility into what it is doing and why. And we can already do this either at the infrastructure level (such as k8s) or just add a route to the Caddyfile and handle the routes differently. Maybe I’m just missing a key use case, but I’m not convinced it actually solves a problem that can’t be solved in other, more visible ways.

henderkes · 2025-09-13T14:35:15Z

It's up to interpretation whether to think of this as solving an error, or masking it. I don't see an issue with the approach, necessary metrics could be given to the user so they can identify the issue if it occurs. I just don't really thing it's our responsibility to work around a user error.

AlliBalliBaba · 2025-09-14T20:13:40Z

I’m still not convinced that it is safe to do so. There would need to be lots of visibility into what it is doing and why. And we can already do this either at the infrastructure level (such as k8s) or just add a route to the Caddyfile and handle the routes differently. Maybe I’m just missing a key use case, but I’m not convinced it actually solves a problem that can’t be solved in other, more visible ways.

The use case would be the one of the hanging requests, where an endpoint receiving a slow 15s request just 2% of the time will eventually hang up the server. With this PR the server will not hang and handle the ~7x more requests that would otherwise have been lost. Generally, it's about resilience towards single slow endpoints.

It's up to interpretation whether to think of this as solving an error, or masking it. I don't see an issue with the approach, necessary metrics could be given to the user so they can identify the issue if it occurs. I just don't really thing it's our responsibility to work around a user error.

The error might might also lie with an external service the user hasn't necessarily control over. I think resilience is always well appreciated, though not strictly our responsibility.

withinboredom · 2025-09-14T20:31:05Z

The use case would be the one of the hanging requests, where an endpoint receiving a slow 15s request just 2% of the time will eventually hang up the server.

Isn’t that almost the same as thread autoscaling then? Why would you want to use this over thread autoscaling and what happens if you mix this PR with it?

AlliBalliBaba · 2025-09-14T22:09:03Z

Yeah it's similar to autoscaling, just that autoscaling will only help for a short while. Currently the PR will mark the last 20% of autoscaled threads as 'low latency' and split the thread pool once it reaches at that point.

EdmondDantes · 2025-09-15T18:44:44Z

@henderkes what you are talking about are async workers, which I also plan to add separately, once async is native to PHP (waiting for this). Async runtimes are a bit more restricting on the application, but can handle slow IO more efficiently.

Hi. Sorry if this is a bit off-topic, but since optimization issues are being discussed here, let me respond.

I spent little time analyzing the question, but I didn’t find a technical way to combine C and Go EventLoop mechanisms. It seems almost or completely impossible because Go has unique mechanisms, including stack allocation and so on, which likely make genuine interaction between C coroutines and Go coroutines impossible.

I would be glad if this conclusion turns out to be wrong, but it’s something to keep in mind.

withinboredom · 2025-09-15T19:23:40Z

I would be glad if this conclusion turns out to be wrong, but it’s something to keep in mind.

I did a bit of research back in the Fiber Caused Epic Crash days (#46). It is possible to lock go-routines to a specific thread -- which is needed because TSRM uses thread-local storage. If we can break TSRM to use an index, instead of thread-local storage (plus a few other places), and then keep all globals in global scope, we can actually use go-routines however we want. It only gets tricky because it would be a Go->C->Go violation (re-entering Go from C is a kiss of death and against the CGO rules -- for now anyway), but that's potentially something that could be worked around through channels.

I have a partially completed patch for TSRM to allow this, that I'm aiming to have done by the end of the year. It doesn't run, yet, but hopefully it will be somewhat runnable soon so I can start experimenting with it.

AlliBalliBaba · 2025-09-16T07:11:24Z

Go changing stacks is an issue we have yet to solve. In its first version, async workers would probably still run in their own threads. Either with a separate eventloop in C or by pushing events to go's scheduler into the go threads (with a potential context switch). Whichever turns out to be more efficient (probably libuv for now).

Running directly on go threads requires some changes in php-src as @withinboredom mentioned and is a far off goal. But it might open up some other optimization opportunities related to parallelism.

Adds latency tracking wip.

fc77651

AlliBalliBaba changed the title ~~feat: latency tracking with max_threads~~ feat: low latency threads Sep 11, 2025

AlliBalliBaba added 8 commits September 12, 2025 22:44

Adds regular threads.

e8b1cf8

Cleans up logic.

cb1652d

Refactoring.

cb84355

Add --rm to load test.

f3a62ad

Fixes statement.

e2ad563

linting.

c5e2c49

fixes go.sh

d05c3cc

Sends hanging request on different path.

5054dc5

Adds thread pools.

368a3ce

AlliBalliBaba added 7 commits September 14, 2025 12:09

Adds gitignore for perf tests.

fa80d40

Refactors request dispatching.

77a29f0

Fixes gitignore.

e50593a

Refactoring.

57c2ed4

Makes latency tracking optional.

91e2807

Adds latency tracking test.

7a1b076

linting.

b35f930

AlliBalliBaba added 2 commits September 14, 2025 17:08

Fixes break statement.

5101417

Adjusts comments.

db76068

feat: low latency threads #1871

Are you sure you want to change the base?

feat: low latency threads #1871

Uh oh!

Conversation

AlliBalliBaba commented Sep 11, 2025

Uh oh!

henderkes commented Sep 11, 2025

Uh oh!

dunglas commented Sep 12, 2025

Uh oh!

AlliBalliBaba commented Sep 12, 2025

Uh oh!

henderkes commented Sep 12, 2025

Uh oh!

withinboredom commented Sep 12, 2025

Uh oh!

AlliBalliBaba commented Sep 12, 2025

Uh oh!

withinboredom commented Sep 12, 2025

Uh oh!

AlliBalliBaba commented Sep 13, 2025

Uh oh!

withinboredom commented Sep 13, 2025

Uh oh!

AlliBalliBaba commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

withinboredom commented Sep 13, 2025

Uh oh!

henderkes commented Sep 13, 2025

Uh oh!

AlliBalliBaba commented Sep 14, 2025

Uh oh!

withinboredom commented Sep 14, 2025

Uh oh!

AlliBalliBaba commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EdmondDantes commented Sep 15, 2025

Uh oh!

withinboredom commented Sep 15, 2025

Uh oh!

AlliBalliBaba commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AlliBalliBaba commented Sep 13, 2025 •

edited

Loading

AlliBalliBaba commented Sep 14, 2025 •

edited

Loading