Skip to content

Wait for Azure Batch job quota to clear before submitting a new job #5575

Open
@adamrtalbot

Description

@adamrtalbot

New feature

Occasionally, you can hit the Azure Batch job limit when running a pipeline, leading to this error:

Error executing process > 'NFCORE_VIRALRECON:ILLUMINA:VARIANTS_IVAR:IVAR_VARIANTS_TO_VCF (SAMPLE2_PE)'

Caused by:
  Status code 409, "{
    "odata.metadata":"https://name.region.batch.azure.com/$metadata#Microsoft.Azure.Batch.Protocol.Entities.Container.errors/@Element","code":"ActiveJobAndScheduleQuotaReached","message":{
      "lang":"en-US","value":"Active job and job schedule quota for the account has been reached.\nRequestId:d28008e3-4018-41cc-90ff-6d96a7952ea3\nTime:2024-12-05T02:43:12.5743776Z"
    }
  }"

We could catch this error and just wait for the job quota to clear. This could be configurable as an azure.batch config option.

Usage scenario

Set azure.batch.behaviourOnJobLimit = 'retry' and Azure will retry with a backoff until cancelled. Alternatively, set it to error and it will do the current behaviour and raise an error.

Raise a warning to the console so the user is aware.

This would allow pipelines to continue, albeit stuck in a queue. As jobs are cleared, the pipelines will continue.

Suggest implementation

tbc

Related: #4792

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions