Skip to content
This repository was archived by the owner on Oct 10, 2019. It is now read-only.
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions src/scripts/slurm_submit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,17 @@ bls_add_job_wrapper
###############################################################

datenow=`date +%Y%m%d`
jobID=`${slurm_binpath}/sbatch $bls_tmp_file` # actual submission
retcode=$?
retry=0
MAX_RETRY=3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this as a config variable slurm_max_submit_retries here, defaulting to 0, and reference it via ${slurm_max_submit_retries}?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

until [ $retry -eq $MAX_RETRY ] ; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the number of attempts to submit is 3 but the number of retries is actually 2, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first attempt is retry=0, once retry=3, the loop condition fails and breaks. That should be three tries, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's 3 tries total but only 2 retries so we should call the variable MAX_TRIES or bump the initial value of retry

jobID=$(${slurm_binpath}/sbatch $bls_tmp_file)
retcode=$?
if [ "$retcode" == "0" ] ; then
break
fi
retry=$[$retry+1]
sleep 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the sleep backoff exponentially?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do.

done

if [ "$retcode" != "0" ] ; then
rm -f $bls_tmp_file
Expand Down