Replies: 1 comment 1 reply
-
|
Been experiencing the same issue. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I have some experience with using slurm and using optuna + joblib launchers for HP sweeps. However in my current setup I would like to start main sweeping process on login machine and make it submit jobs to cluster in batches.
I have a training script that works fine if i just sbatch it to the cluster, but when I attempt to do something like this:
(I pruned a bunch of parameters)
and then:
bash hp_sweep.shscript fails after a few seconds and there is no way to even access the stack trace because
HYDRA_FULL_ERRORdoesn't get passed (as you can see I tried every possible way I know)I was installing submitit launcher with
python -m pip install 'git+https://github.com/facebookresearch/hydra.git#egg=hydra-submitit-launcher&subdirectory=plugins/hydra_submitit_launcher'since latest version is not released in PyPIIs there any way to fix the stack trace issue?
EDIT: I got to the core of a problem, it wasn't actually issue in my code, but
submititfor some reason can't decide which env to use:RuntimeError: Could not figure out which environment the job is runnning in. Known environments: slurm, local, debug.I tried to hack around this by specifying
_TEST_CLUSTER_is bothsetupandsrun_argsbut this boils down to the same problem with passingHYDRA_FULL_ERRORbecause those values got ignoredThere are multiple issues regarding this specific problem, but I don't see any clear resolution to this issue. @omry
Beta Was this translation helpful? Give feedback.
All reactions