Skip to content

Parallel printing to stderr #700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stefdoerr opened this issue Nov 2, 2015 · 13 comments · Fixed by ipython/ipykernel#85
Closed

Parallel printing to stderr #700

stefdoerr opened this issue Nov 2, 2015 · 13 comments · Fixed by ipython/ipykernel#85

Comments

@stefdoerr
Copy link

Hello, I am using joblib.Parallel to parallelize my computations. Parallel is able to print out to stderr the progress of the calculations. This works fine in IPython console where the progress is updated every few seconds, but in jupyter notebook the progress is only printed once the computation has completed.

If you want to try it:

import time
from math import sqrt
import joblib

def my_sqrt(x):
    time.sleep(0.1)
    return sqrt(x)

joblib.Parallel(n_jobs=2, verbose=11)(joblib.delayed(my_sqrt)(i**2) for i in range(50))

I talked with the joblib devs and they said that it must be a problem in the Notebook since stderr should be printed directly.
It works fine in the Notebook as well if you put n_jobs=1 which just runs it serially so it seems like Notebook doesn't update stderr when there is threading.

I would appreciate any insight :)

@GaelVaroquaux
Copy link
Contributor

As a data point, I have added explicit flushing of stderr in a local copy of joblib (as suggested by @ellisonbg according to scikit-learn/scikit-learn#5811), but it didn't solve the problem.

Technically, stderr shouldn't be buffered (according to Unix specs), so in theory it shouldn't make a difference.

@Carreau
Copy link
Member

Carreau commented Nov 17, 2015

IPython console

The author means plain IPython, using $ ipython console (yes this is a thing) I can reproduce that. So this is likely to be something deeper of the internal of the protocol than something notebook specific.

@GaelVaroquaux
Copy link
Contributor

As I looked a bit at these problems a long time ago, when I was working on the IPython Qt console, I have a hunch that this is due to the way we capture outputs of processes. It's probably not trivial to fix (sorry, :$). I suspect that using some Unix-specific stuff (ie not Python such as stream redirection to a named pipe) might help on the Unix side, and maybe some windows-specific stuff too (the equivalent in Windows, which I don't understand well). The good news that solving such a problem is certainly possible, as something like travis-CI captures output and redirects to web without any problem.

@amueller
Copy link

hm... @minrk might have some insights?

@minrk
Copy link
Member

minrk commented Nov 18, 2015

There is some funkiness in how we capture output from forked processes that proves to be quite fragile. I will try to investigate what's going wrong in this case.

@minrk minrk added this to the Not notebook (need to be migrated) milestone Dec 16, 2015
@minrk
Copy link
Member

minrk commented Dec 16, 2015

Once I finish it, this should be fixed by ipython/ipykernel#85

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Jan 19, 2016 via email

@stefdoerr
Copy link
Author

Awesome! Thanks guys!

@neighthan
Copy link

When I run @stefdoerr's example, it now correctly prints during computation. However, if I print within a function myself instead of just having the joblib progress-logging prints, this still doesn't seem to be supported. E.g.:

from joblib import Parallel, delayed

def f(x):
    print(f"x = {x}")
    return x

Parallel(n_jobs=2)(delayed(f)(x) for x in range(5))

doesn't print anything. However, if I set n_jobs=1, then it prints properly. Is it expected that this should be supported now as well or no?

@scottgigante
Copy link

I'll echo this: printing to stdout and stderr from within forked processes is hidden from the Jupyter notebook output and prints instead to the notebook backend (which can be done manually by os.write -- I think Jupyter monkey patches sys.stderr/out but doesn't patch os.write.)

Neither of the following print anything to the notebook output.

from joblib import Parallel, delayed
import sys

def f(x, stream):
    stream = getattr(sys, stream)
    print(f"x = {x}", file=stream)
    stream.flush()
    return x

Parallel(n_jobs=2)(delayed(f)(x, stream="stdout") for x in range(5))
Parallel(n_jobs=2)(delayed(f)(x, stream="stderr") for x in range(5))

but both of these work

Parallel(n_jobs=2, backend='multiprocessing')(delayed(f)(x, stream="stdout") for x in range(5))
Parallel(n_jobs=2, backend='multiprocessing')(delayed(f)(x, stream="stderr") for x in range(5))

@GaelVaroquaux
Copy link
Contributor

Cc @ogrisel @tomMoral : is this due to loky?

@tomMoral
Copy link

tomMoral commented Feb 8, 2020

It seems to me that this work with multiprocessing is only because it uses fork to start the processes. In this case, the monkey-patched stdout/stderr are conserved, and you get the prints where you expect them.

If the start method for Process is set to forkserver or spawn, I don't think this will work and it should be similar to loky

But this is very hard to come up with a good solution to propagate monkey patching in the child process. In this is a similar problem with warning filters and globals propagation. In case, a potential solution could be to modify sys.executable to point toward a jupyter kernel designed for child processes, with the correct monkey patch?

@zfortier
Copy link

zfortier commented Feb 17, 2020

I believe this is the same issue discussed here: https://stackoverflow.com/questions/55955330/printed-output-not-displayed-when-using-joblib-in-jupyter-notebook
I think my response is accurate, but I don't want to misrepresent anything. Thanks everyone!

Edit:
to further support what @tomMoral said, using backend=threading produces the expected output too.

Edit2:
related: ipython/ipykernel#402

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants