Skip to content

question about expected speedup when using parallelization via joblib #169

@nguyentr17

Description

@nguyentr17

Hi,

I came across your repository while searching for ways to train multiple NN models simultaneously using 1 single GPU. My model is pretty small (just 1 layer MLP) and the VRAM used more each model is only 260mb. However, when I try to use joblib train multiple models at the same time, though they do start at the same time (according to the log), the total training time is still the same as training models sequentially. Do you happen to have any tips / quick insights / things to look at for this? I know this is not directly an issue with your package but would really appreciate any help.

My code is like this:

    with parallel_backend('loky', n_jobs=-1):
        parallel = Parallel(n_jobs=-1)
        parallel(
            delayed(process_latent_pair)(mi_estimator, iid, tid, cfg, exp_name, args, DEVICE) # process_latent_pair trains 1 NN model
            for iid in range(13)
            for tid in range(13)
        )

My environment:

python 3.9.19
torch==2.4.1
joblib==1.4.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions