Description
Description
We have deployed JEG on Kubernetes cluster and trying to run spark_python_kubernetes and python_kubernetes kernel using it. We are seeing an issue with JEG server pod going out of memory every 6-10 hours. We have given max memory of 4 GB to the Pod.
This is mainly happening when we have a Notebook UI client trying to re-establish a Web-socket connection to previously running kernel i.e. Notebook UI still thinks kernel exists but JEG does not know about it anymore. This happens when JEG gets restarted and loses context about the previously running kernels but notebook does not know about the JEG restart and keeps on trying to connect to the existing kernel session(?).
Based on the testing, we think that _register_session method in the notebook kernel handler is causing the leak by creating a new session object each time notebook tries to hit the /api/kernels/<>/channels
API call. Thing to note is: JEG returns 404
response to Notebook but notebook does not stop trying.
More details about the issue available here: jupyter/notebook#6244
tagging @Vishwajeet0510 from our team working on this issue.
@kevin-bates : have you seen this behaviour earlier?
Screenshots / Logs
Environment
Enterprise Gateway Version [v 2.1.0]
Notebook Version [v 6.0.3]
Others [Artillery : 1.7.9]