Skip to content

Conversation

@AlliBalliBaba
Copy link
Contributor

This PR is inspired by issues like #1821 and #1796

It's not uncommon for servers to experience latency spikes due to external resources getting slow. This can potentially cause all resources to be consumed waiting for these slow requests to complete. In thread-pool based servers this usually comes in the form of all threads getting stuck and the CPU going idle. While there are techniques to avoid this, it's not always trivial or apparent how to do so.

This PR will automatically detect slow requests (based on path) and prevent them from dispatching to a selected number of threads. While this will not get the slow requests unstuck, it will allow low-latency requests to still be responsive, greatly increasing the servers resilience (in theory).

wip

@AlliBalliBaba AlliBalliBaba changed the title feat: latency tracking with max_threads feat: low latency threads Sep 11, 2025
@henderkes
Copy link
Contributor

I think it's a really cool solution, but I'm not sure if it really falls in the responsibility of FrankenPHP, rather than the user. Does FrankenPHP currently diverge from Caddy + FPM when it comes to scenarios like this?

It's a bit like selling a solution to things a user shouldn't be doing in the first place.

@dunglas
Copy link
Member

dunglas commented Sep 12, 2025

I think it's a smart solution.

@AlliBalliBaba
Copy link
Contributor Author

I'm actually not sure if FPM does something similar.

This isn't really a solution to slow requests, nor do I think that FrankenPHP can provide one for every situation.

It's more about making the server resilient if worst comes to worst. The moment you realize that a service can randomly time out on you, it's usually already too late.

The logic still needs some refining though

@henderkes
Copy link
Contributor

I can think of an alternative approach that swoole uses, where every IO call is hooked and the coroutine is suspended until the IO returns. That way a user could have 1000000 requests waiting on a http response, or a file, but the webserver would be able to count those "threads" as free and handle new requests with them.

I'm not sure how feasible that is to achieve without hooking into the php runtime through an extension, though.

@withinboredom
Copy link
Member

Interesting solution! I'm not sure this is an actual problem though. With thread autoscaling and the fact that caddy will hold requests until a thread is actually available, I'm not sure we're actually solving a real problem (in both those issues, afaict, thread autoscaling was not being utilized).

@AlliBalliBaba
Copy link
Contributor Author

@henderkes what you are talking about are async workers, which I also plan to add separately, once async is native to PHP (waiting for this). Async runtimes are a bit more restricting on the application, but can handle slow IO more efficiently.

@withinboredom autoscaling only works until all threads are consumed, this would keep the server going even afterwards.

@withinboredom
Copy link
Member

If all threads are consumed, should it keep serving traffic? I feel like this would just mask issues. For example, it may continue to respond to /healthz saying everything is fine when in fact, nearly every request going to /pay/me/money is getting hung.

Something like this probably requires a ton of observability and monitoring tools that will be application-specific. The proper solution is probably to kick whatever slow thing to a queue instead of trying to handle it.

@AlliBalliBaba
Copy link
Contributor Author

I agree that slow things belong on queues, but I disagree that this would reduce observability. Going down that logic, every minor issue with any route should make the server crash and health check fail, which is something most people probably don't prefer.

You might even argue this improves observability since it will only make the routes fail that are actually getting stuck, making it easier to debug.

@withinboredom
Copy link
Member

I appreciate your take on this, no one wants the whole server to crash on a single failure. I think our difference is in how we want the system to surface issues and what is most actionable for ops to handle.

My worry is that this creates a false-negative: you could have 90% of your business logic stalled out, but all your dashboards show perfect health. This risks hiding real issues from the people who need to act on them. We need to make sure we surface metrics that show requests being partitioned and document what they mean and how to handle them. Otherwise, ops teams will be flying blind.

As things are now, as performance starts to degrade, health checks will also start to get more latency. This is a signal that things aren’t going well. With this change, health checks probably will not degrade, so we need to monitor these new metrics as well and be able to alert on them somehow.

Currently, at some point, all threads will be used and health checks will fail, removing the instance from a load balancer pool. This will prevent the instance from serving new requests but allow requests to still be (eventually) completed. This is pretty standard ops in things like kubernetes or other control planes. Adding this feature would make this kind of thing nearly impossible and invisible to control planes, preventing degraded servers from being rotated out.

In other words, I think this is a Band-Aid on applications. There are proper ways to solve it that I think we both agree on: queues.

If you do have slow requests that queues can’t handle, it is pretty straightforward to create a pool that handles just these slow routes without affecting any others. In which case, these can be monitored separately with their own set of alerts.

@AlliBalliBaba
Copy link
Contributor Author

AlliBalliBaba commented Sep 13, 2025

Kubernetes will usually scale based on different predefined metrics. The health check timing out will just cause containers to restart over and over (bad time, been there). This PR will keep the server healthy and just time out the slow endpoints, similar to a connection pool that's overburdened.

I'm fine with making it opt-in though (maybe in combination with max_wait_tine?)

If you do have slow requests that queues can’t handle, it is pretty straightforward to create a pool that handles just these slow routes without affecting any others. In which case, these can be monitored separately with their own set of alerts.

Splitting the pool is what this PR attempts to do, but automatically.

@withinboredom
Copy link
Member

The health check timing out will just cause containers to restart over and over (bad time, been there).

Hmmm. I’m not sure what you mean. k8s have both liveness and readiness probes. Liveness probes will restart the container, but readiness will just remove it from the ingress. Basically, the pattern is set lower readiness timeouts than liveness to give a container a way to get into a healthy state on its own without killing it (and potentially losing work).

This PR will keep the server healthy and just time out the slow endpoints, similar to a connection pool that's overburdened.

I’m still not convinced that it is safe to do so. There would need to be lots of visibility into what it is doing and why. And we can already do this either at the infrastructure level (such as k8s) or just add a route to the Caddyfile and handle the routes differently. Maybe I’m just missing a key use case, but I’m not convinced it actually solves a problem that can’t be solved in other, more visible ways.

@henderkes
Copy link
Contributor

It's up to interpretation whether to think of this as solving an error, or masking it. I don't see an issue with the approach, necessary metrics could be given to the user so they can identify the issue if it occurs. I just don't really thing it's our responsibility to work around a user error.

@AlliBalliBaba
Copy link
Contributor Author

I’m still not convinced that it is safe to do so. There would need to be lots of visibility into what it is doing and why. And we can already do this either at the infrastructure level (such as k8s) or just add a route to the Caddyfile and handle the routes differently. Maybe I’m just missing a key use case, but I’m not convinced it actually solves a problem that can’t be solved in other, more visible ways.

The use case would be the one of the hanging requests, where an endpoint receiving a slow 15s request just 2% of the time will eventually hang up the server. With this PR the server will not hang and handle the ~7x more requests that would otherwise have been lost. Generally, it's about resilience towards single slow endpoints.

It's up to interpretation whether to think of this as solving an error, or masking it. I don't see an issue with the approach, necessary metrics could be given to the user so they can identify the issue if it occurs. I just don't really thing it's our responsibility to work around a user error.

The error might might also lie with an external service the user hasn't necessarily control over. I think resilience is always well appreciated, though not strictly our responsibility.

@withinboredom
Copy link
Member

The use case would be the one of the hanging requests, where an endpoint receiving a slow 15s request just 2% of the time will eventually hang up the server.

Isn’t that almost the same as thread autoscaling then? Why would you want to use this over thread autoscaling and what happens if you mix this PR with it?

@AlliBalliBaba
Copy link
Contributor Author

AlliBalliBaba commented Sep 14, 2025

Yeah it's similar to autoscaling, just that autoscaling will only help for a short while. Currently the PR will mark the last 20% of autoscaled threads as 'low latency' and split the thread pool once it reaches at that point.

@EdmondDantes
Copy link

@henderkes what you are talking about are async workers, which I also plan to add separately, once async is native to PHP (waiting for this). Async runtimes are a bit more restricting on the application, but can handle slow IO more efficiently.

Hi. Sorry if this is a bit off-topic, but since optimization issues are being discussed here, let me respond.

I spent little time analyzing the question, but I didn’t find a technical way to combine C and Go EventLoop mechanisms. It seems almost or completely impossible because Go has unique mechanisms, including stack allocation and so on, which likely make genuine interaction between C coroutines and Go coroutines impossible.

I would be glad if this conclusion turns out to be wrong, but it’s something to keep in mind.

@withinboredom
Copy link
Member

I would be glad if this conclusion turns out to be wrong, but it’s something to keep in mind.

I did a bit of research back in the Fiber Caused Epic Crash days (#46). It is possible to lock go-routines to a specific thread -- which is needed because TSRM uses thread-local storage. If we can break TSRM to use an index, instead of thread-local storage (plus a few other places), and then keep all globals in global scope, we can actually use go-routines however we want. It only gets tricky because it would be a Go->C->Go violation (re-entering Go from C is a kiss of death and against the CGO rules -- for now anyway), but that's potentially something that could be worked around through channels.

I have a partially completed patch for TSRM to allow this, that I'm aiming to have done by the end of the year. It doesn't run, yet, but hopefully it will be somewhat runnable soon so I can start experimenting with it.

@AlliBalliBaba
Copy link
Contributor Author

Go changing stacks is an issue we have yet to solve. In its first version, async workers would probably still run in their own threads. Either with a separate eventloop in C or by pushing events to go's scheduler into the go threads (with a potential context switch). Whichever turns out to be more efficient (probably libuv for now).

Running directly on go threads requires some changes in php-src as @withinboredom mentioned and is a far off goal. But it might open up some other optimization opportunities related to parallelism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants