-
Notifications
You must be signed in to change notification settings - Fork 405
feat: low latency threads #1871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I think it's a really cool solution, but I'm not sure if it really falls in the responsibility of FrankenPHP, rather than the user. Does FrankenPHP currently diverge from Caddy + FPM when it comes to scenarios like this? It's a bit like selling a solution to things a user shouldn't be doing in the first place. |
|
I think it's a smart solution. |
|
I'm actually not sure if FPM does something similar. This isn't really a solution to slow requests, nor do I think that FrankenPHP can provide one for every situation. It's more about making the server resilient if worst comes to worst. The moment you realize that a service can randomly time out on you, it's usually already too late. The logic still needs some refining though |
|
I can think of an alternative approach that I'm not sure how feasible that is to achieve without hooking into the php runtime through an extension, though. |
|
Interesting solution! I'm not sure this is an actual problem though. With thread autoscaling and the fact that caddy will hold requests until a thread is actually available, I'm not sure we're actually solving a real problem (in both those issues, afaict, thread autoscaling was not being utilized). |
|
@henderkes what you are talking about are async workers, which I also plan to add separately, once async is native to PHP (waiting for this). Async runtimes are a bit more restricting on the application, but can handle slow IO more efficiently. @withinboredom autoscaling only works until all threads are consumed, this would keep the server going even afterwards. |
|
If all threads are consumed, should it keep serving traffic? I feel like this would just mask issues. For example, it may continue to respond to Something like this probably requires a ton of observability and monitoring tools that will be application-specific. The proper solution is probably to kick whatever slow thing to a queue instead of trying to handle it. |
|
I agree that slow things belong on queues, but I disagree that this would reduce observability. Going down that logic, every minor issue with any route should make the server crash and health check fail, which is something most people probably don't prefer. You might even argue this improves observability since it will only make the routes fail that are actually getting stuck, making it easier to debug. |
|
I appreciate your take on this, no one wants the whole server to crash on a single failure. I think our difference is in how we want the system to surface issues and what is most actionable for ops to handle. My worry is that this creates a false-negative: you could have 90% of your business logic stalled out, but all your dashboards show perfect health. This risks hiding real issues from the people who need to act on them. We need to make sure we surface metrics that show requests being partitioned and document what they mean and how to handle them. Otherwise, ops teams will be flying blind. As things are now, as performance starts to degrade, health checks will also start to get more latency. This is a signal that things aren’t going well. With this change, health checks probably will not degrade, so we need to monitor these new metrics as well and be able to alert on them somehow. Currently, at some point, all threads will be used and health checks will fail, removing the instance from a load balancer pool. This will prevent the instance from serving new requests but allow requests to still be (eventually) completed. This is pretty standard ops in things like kubernetes or other control planes. Adding this feature would make this kind of thing nearly impossible and invisible to control planes, preventing degraded servers from being rotated out. In other words, I think this is a Band-Aid on applications. There are proper ways to solve it that I think we both agree on: queues. If you do have slow requests that queues can’t handle, it is pretty straightforward to create a pool that handles just these slow routes without affecting any others. In which case, these can be monitored separately with their own set of alerts. |
|
Kubernetes will usually scale based on different predefined metrics. The health check timing out will just cause containers to restart over and over (bad time, been there). This PR will keep the server healthy and just time out the slow endpoints, similar to a connection pool that's overburdened. I'm fine with making it opt-in though (maybe in combination with
Splitting the pool is what this PR attempts to do, but automatically. |
Hmmm. I’m not sure what you mean. k8s have both liveness and readiness probes. Liveness probes will restart the container, but readiness will just remove it from the ingress. Basically, the pattern is set lower readiness timeouts than liveness to give a container a way to get into a healthy state on its own without killing it (and potentially losing work).
I’m still not convinced that it is safe to do so. There would need to be lots of visibility into what it is doing and why. And we can already do this either at the infrastructure level (such as k8s) or just add a route to the Caddyfile and handle the routes differently. Maybe I’m just missing a key use case, but I’m not convinced it actually solves a problem that can’t be solved in other, more visible ways. |
|
It's up to interpretation whether to think of this as solving an error, or masking it. I don't see an issue with the approach, necessary metrics could be given to the user so they can identify the issue if it occurs. I just don't really thing it's our responsibility to work around a user error. |
The use case would be the one of the hanging requests, where an endpoint receiving a slow 15s request just 2% of the time will eventually hang up the server. With this PR the server will not hang and handle the ~7x more requests that would otherwise have been lost. Generally, it's about resilience towards single slow endpoints.
The error might might also lie with an external service the user hasn't necessarily control over. I think resilience is always well appreciated, though not strictly our responsibility. |
Isn’t that almost the same as thread autoscaling then? Why would you want to use this over thread autoscaling and what happens if you mix this PR with it? |
|
Yeah it's similar to autoscaling, just that autoscaling will only help for a short while. Currently the PR will mark the last 20% of autoscaled threads as 'low latency' and split the thread pool once it reaches at that point. |
Hi. Sorry if this is a bit off-topic, but since optimization issues are being discussed here, let me respond. I spent little time analyzing the question, but I didn’t find a technical way to combine C and Go EventLoop mechanisms. It seems almost or completely impossible because Go has unique mechanisms, including stack allocation and so on, which likely make genuine interaction between C coroutines and Go coroutines impossible. I would be glad if this conclusion turns out to be wrong, but it’s something to keep in mind. |
I did a bit of research back in the Fiber Caused Epic Crash days (#46). It is possible to lock go-routines to a specific thread -- which is needed because TSRM uses thread-local storage. If we can break TSRM to use an index, instead of thread-local storage (plus a few other places), and then keep all globals in global scope, we can actually use go-routines however we want. It only gets tricky because it would be a Go->C->Go violation (re-entering Go from C is a kiss of death and against the CGO rules -- for now anyway), but that's potentially something that could be worked around through channels. I have a partially completed patch for TSRM to allow this, that I'm aiming to have done by the end of the year. It doesn't run, yet, but hopefully it will be somewhat runnable soon so I can start experimenting with it. |
|
Go changing stacks is an issue we have yet to solve. In its first version, async workers would probably still run in their own threads. Either with a separate eventloop in C or by pushing events to go's scheduler into the go threads (with a potential context switch). Whichever turns out to be more efficient (probably libuv for now). Running directly on go threads requires some changes in |
This PR is inspired by issues like #1821 and #1796
It's not uncommon for servers to experience latency spikes due to external resources getting slow. This can potentially cause all resources to be consumed waiting for these slow requests to complete. In thread-pool based servers this usually comes in the form of all threads getting stuck and the CPU going idle. While there are techniques to avoid this, it's not always trivial or apparent how to do so.
This PR will automatically detect slow requests (based on path) and prevent them from dispatching to a selected number of threads. While this will not get the slow requests unstuck, it will allow low-latency requests to still be responsive, greatly increasing the servers resilience (in theory).
wip