Open
Description
New feature
Occasionally, you can hit the Azure Batch job limit when running a pipeline, leading to this error:
Error executing process > 'NFCORE_VIRALRECON:ILLUMINA:VARIANTS_IVAR:IVAR_VARIANTS_TO_VCF (SAMPLE2_PE)'
Caused by:
Status code 409, "{
"odata.metadata":"https://name.region.batch.azure.com/$metadata#Microsoft.Azure.Batch.Protocol.Entities.Container.errors/@Element","code":"ActiveJobAndScheduleQuotaReached","message":{
"lang":"en-US","value":"Active job and job schedule quota for the account has been reached.\nRequestId:d28008e3-4018-41cc-90ff-6d96a7952ea3\nTime:2024-12-05T02:43:12.5743776Z"
}
}"
We could catch this error and just wait for the job quota to clear. This could be configurable as an azure.batch
config option.
Usage scenario
Set azure.batch.behaviourOnJobLimit = 'retry'
and Azure will retry with a backoff until cancelled. Alternatively, set it to error
and it will do the current behaviour and raise an error.
Raise a warning to the console so the user is aware.
This would allow pipelines to continue, albeit stuck in a queue. As jobs are cleared, the pipelines will continue.
Suggest implementation
tbc
Related: #4792