Skip to content

env parameter in DDPJobDefinition doesn't pass env variables to Ray #408

Open
@sutaakar

Description

@sutaakar

Describe the Bug

I want to submit Ray job with environment variables specified, however provided environment variables aren't passed into the Ray.

SDK doc specifies that DDPJobDefinition contains property env. I tried to pass there environment variables:

jobdef = DDPJobDefinition(
    name="mnisttest",
    script="mnist.py",
    scheduler_args={"requirements": "requirements.txt"},
    env={"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
         "PIP_TRUSTED_HOST": "some-hostname"}
)
job = jobdef.submit(cluster)

However submitted job didn't contain passed environment variables.

Is this a correct way of passing environment variables using SDK?

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Codeflare SDK: 0.12.1
Ray image: quay.io/project-codeflare/ray:latest-py39-cu118

Steps to Reproduce the Bug

  1. Start ODH with default science notebook,
  2. import SDK Git repo into the Notebook
  3. Open 2_basic_jobs.ipynb
  4. Add env entry into the job definition:
jobdef = DDPJobDefinition(
    name="mnisttest",
    script="mnist.py",
    # script="mnist_disconnected.py", # training script for disconnected environment
    scheduler_args={"requirements": "requirements.txt"},
    env={"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
         "PIP_TRUSTED_HOST": "some-hostname"}
)
job = jobdef.submit(cluster)
  1. Run the notebook until you submit the job
  2. Query Ray REST API to get submitted job definition, i.e. curl -X GET -i 'http://<dashboard_hostname>/api/jobs/'
  3. Check response - env variables are missing in submitted job

What Have You Already Tried to Debug the Issue?

N/A

Expected Behavior

Submitted job contains environment variables, for example:

{
  "type": "SUBMISSION",
  "job_id": null,
  "submission_id": "raysubmit_qtYVHfiyC7VhAPN7",
  "driver_info": null,
  "status": "FAILED",
  "entrypoint": "python /home/ray/jobs/mnist.py",
  "message": "Job entrypoint command failed with exit code 2, last available logs (truncated to 20,000 chars):\npython: can't open file '/home/ray/jobs/mnist.py': [Errno 2] No such file or directory\n",
  "error_type": null,
  "start_time": 1700576474095,
  "end_time": 1700576476706,
  "metadata": null,
  "runtime_env": {
    "pip": {
      "packages": ["pytorch_lightning==1.5.10", "ray_lightning", "torchmetrics==0.9.1", "torchvision==0.12.0"],
      "pip_check": false
    },
    "env_vars": {
      "PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
      "PIP_TRUSTED_HOST": "some-hostname"
    }
  },
  "driver_agent_http_address": "http://10.129.3.14:52365",
  "driver_node_id": "c3af4445c3cabfdc2291fb2fd6393da5850717eb3fd2aaeda3abe5f8"
}

Screenshots, Console Output, Logs, etc.

Affected Releases

SDK 0.12.1

Additional Context

Add as applicable and when known:

  • OS: 1) MacOS, 2) Linux, 3) Windows: [1 - 3]
  • OS Version: [e.g. RedHat Linux X.Y.Z, MacOS Monterey, ...]
  • Browser (UI issues): 1) Chrome, 2) Safari, 3) Firefox, 4) Other (describe): [1 - 4 + description?]
  • Browser Version (UI issues): [e.g. Firefix 97.0]
  • Cloud: 1) AWS, 2) IBM Cloud, 3) Other (describe), or 4) on-premise: [1 - 4 + description?]
  • Kubernetes: 1) OpenShift, 2) Other K8s [1 - 2 + description]
  • OpenShift or K8s version: [e.g. 1.23.1]
  • Other relevant info

Add any other information you think might be useful here.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions