-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime/debug.SetCrashOutput + GOGC=off + GOMEMLIMIT can lead to OOM kills #73490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Funny, I was just thinking this morning about how the crashmonitor causes all init functions to run twice, once in the parent and once in the child, causing both the startup latency and memory allocation costs of the application to be doubled. I was wondering whether we already have any means of profiling the initialization: what I had in mind was an environment variable that would cause all init functions to report their wall time to stderr; such a mechanism could also report whether any go statements were executed. This would make it easy to track down bloated initializers and make them lazy (using sync.Once) instead. Also, I was wondering whether the crashmonitor child process's use of the current directory after the parent process has terminated might be the cause of #73481 on Windows, and that if so, the remedy is for the child to os.Chdir("/") like a UNIX daemon process. I agree that a change to the documentation and example is the appropriate fix. |
Which fix do you prefer? Not inherit the env of the parent at all, or stripping the GOMEMLIMIT (and maybe GOGC) env vars? Anyway, I think even with this improvement, the nature of One thing we're considering on our end is embedding a fat cosmopolitan binary to use as the crash monitor. This would be written to a |
Just to be as precise as possible, it's not just passing FWIW, inheriting environment variables causes problems in a variety of ways, not just with Perhaps we should recommend I am also in support of updating the example for |
Yes! Sorry for omitting this important (!) detail. This program was indeed operating with
Sounds good. But considering what you wrote above, wouldn't it be better to not propagate any environment variables in the example? |
Yup, sorry, I support that, too. :) |
Change https://go.dev/cl/667775 mentions this issue: |
Thanks; I had a vague feeling it existed already, just without the goroutine part.
I agree, it does require an unsatisfactory degree of control over the whole executable. However, depending on what you want to do, the crashmonitor (child) process currently must use the same executable as the target (parent) process because it needs access to the correct symbol table and pclntab (runtime.CallersFrames and runtime.Func.FileLine). At least, that's what the gopls crashmonitor does. If the runtime were responsible for computing all this information and emitting it as JSON, the child process could be almost anything. We've talked about doing this (#65761), but it's a fiddly project. |
Related Code Changes (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
We recently noticed some OOM kills, and after a fun debugging adventure involving core dumps, we concluded that the issue was caused by our usage of the SetCrashOutput API which mirrors the example from the documentation.
In our case the parent process is subject to a
GOMEMLIMIT
env variable, and the child process inherited it. This in combination with someinit
functions that were spawning background goroutine activity caused the child process to grow its heap without the GC kicking in due to the high memory limit inherited from the parent process. Eventually this lead to the child+parent exceeding the memory limit of the container, resulting in an OOM kill.Suggestion: Let's update the docs to either strip the memory limit env var or avoid passing any environment variables from the parent process (remove the
os.Environ()
). I can send a patch if there is consensus behind either of those two approaches.cc @mknyszek
Edit: The parent process was also setting
GOGC=off
. That's what actually caused the OOM.The text was updated successfully, but these errors were encountered: