GPU enabled pipeline behaviour when the Cluster doesn't have enough GPU #11497
Unanswered
rajendra-avesha
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have DAG pipeline with two components; one component requires 1 cpu and other requires 1 cpu and 1 gpu. When I tried to run the pipeline using kubeflow UI.
I had noticed that it is not creating gpu-dag-pipeline-5nfmx-container-impl-xxxxx pods for both components. Please find the snapshot of the pods as follows:
kubectl get pods -n kubeflow-user-example-com --watch NAME READY STATUS RESTARTS AGE ml-pipeline-ui-artifact-7779f6ddc8-xdsjb 2/2 Running 0 6d23h ml-pipeline-visualizationserver-777747b89b-pzl2r 2/2 Running 0 6d23h gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 PodInitializing 0 2s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 2/2 Running 0 3s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Pending 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Pending 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877 0/2 Completed 0 11s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Init:0/1 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 PodInitializing 0 2s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 2/2 Running 0 3s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 1/2 NotReady 0 4s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 6s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Pending 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Init:0/1 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-1052282141 0/2 Completed 0 11s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Init:0/1 0 0s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 PodInitializing 0 1s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 2/2 Running 0 2s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 1/2 NotReady 0 3s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 1/2 NotReady 0 3s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 4s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 5s gpu-dag-pipeline-5nfmx-system-container-driver-4177297792 0/2 Completed 0 10s
Please find the workflow CR details as follows
`kubectl describe workflow gpu-dag-pipeline-5nfmx -n kubeflow-user-example-com
Name: gpu-dag-pipeline-5nfmx
Namespace: kubeflow-user-example-com
Labels: pipeline/persistedFinalState=true
pipeline/runid=7f190382-7cfc-49da-b4e3-9d3b90580e71
workflows.argoproj.io/completed=true
workflows.argoproj.io/phase=Succeeded
Annotations: pipelines.kubeflow.org/components-comp-preprocess-gpu-stage:
{"executorLabel":"exec-preprocess-gpu-stage","outputDefinitions":{"parameters":{"Output":{"parameterType":"STRING"}}}}
pipelines.kubeflow.org/components-comp-process-gpu-stage:
{"executorLabel":"exec-process-gpu-stage","inputDefinitions":{"parameters":{"input_data":{"parameterType":"STRING"}}},"outputDefinitions":...
pipelines.kubeflow.org/components-root:
{"dag":{"tasks":{"preprocess-gpu-stage":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskIn...
pipelines.kubeflow.org/implementations-comp-preprocess-gpu-stage:
{"args":["--executor_input","{{$}}","--function_to_execute","preprocess_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)...
pipelines.kubeflow.org/implementations-comp-process-gpu-stage:
{"args":["--executor_input","{{$}}","--function_to_execute","process_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)" ]...
pipelines.kubeflow.org/run_name: Run of new-gpu-dag (fa3b9)
workflows.argoproj.io/pod-name-format: v2
API Version: argoproj.io/v1alpha1
Kind: Workflow
Metadata:
Creation Timestamp: 2025-01-02T11:53:52Z
Generate Name: gpu-dag-pipeline-
Generation: 5
Resource Version: 7028722
UID: c772cafa-97ec-4784-ae13-f1a5a9f649bd
Spec:
Arguments:
Entrypoint: entrypoint
Pod Metadata:
Annotations:
pipelines.kubeflow.org/v2_component: true
Labels:
pipeline/runid: 7f190382-7cfc-49da-b4e3-9d3b90580e71
pipelines.kubeflow.org/v2_component: true
Service Account Name: default-editor
Templates:
Container:
Args:
--type
CONTAINER
--pipeline_name
gpu-dag-pipeline
--run_id
7f190382-7cfc-49da-b4e3-9d3b90580e71
--dag_execution_id
{{inputs.parameters.parent-dag-id}}
--component
{{inputs.parameters.component}}
--task
{{inputs.parameters.task}}
--container
{{inputs.parameters.container}}
--iteration_index
{{inputs.parameters.iteration-index}}
--cached_decision_path
{{outputs.parameters.cached-decision.path}}
--pod_spec_patch_path
{{outputs.parameters.pod-spec-patch.path}}
--condition_path
{{outputs.parameters.condition.path}}
--kubernetes_config
{{inputs.parameters.kubernetes-config}}
Command:
driver
Image: gcr.io/ml-pipeline/kfp-driver@sha256:3c0665cd36aa87e4359a4c8b6271dcba5bdd817815cd0496ed12eb5dde5fd2ec
Name:
Resources:
Limits:
Cpu: 500m
Memory: 512Mi
Requests:
Cpu: 100m
Memory: 64Mi
Inputs:
Parameters:
Name: component
Name: task
Name: container
Name: parent-dag-id
Default: -1
Name: iteration-index
Default:
Name: kubernetes-config
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-container-driver
Outputs:
Parameters:
Name: pod-spec-patch
Value From:
Default:
Path: /tmp/outputs/pod-spec-patch
Default: false
Name: cached-decision
Value From:
Default: false
Path: /tmp/outputs/cached-decision
Name: condition
Value From:
Default: true
Path: /tmp/outputs/condition
Dag:
Tasks:
Arguments:
Parameters:
Name: pod-spec-patch
Value: {{inputs.parameters.pod-spec-patch}}
Name: executor
Template: system-container-impl
When: {{inputs.parameters.cached-decision}} != true
Inputs:
Parameters:
Name: pod-spec-patch
Default: false
Name: cached-decision
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-container-executor
Outputs:
Container:
Command:
should-be-overridden-during-runtime
Env:
Name: KFP_POD_NAME
Value From:
Field Ref:
Field Path: metadata.name
Name: KFP_POD_UID
Value From:
Field Ref:
Field Path: metadata.uid
Env From:
Config Map Ref:
Name: metadata-grpc-configmap
Optional: true
Image: gcr.io/ml-pipeline/should-be-overridden-during-runtime
Name:
Resources:
Volume Mounts:
Mount Path: /kfp-launcher
Name: kfp-launcher
Init Containers:
Command:
launcher-v2
--copy
/kfp-launcher/launch
Image: gcr.io/ml-pipeline/kfp-launcher@sha256:8fe5e6e4718f20b021736022ad3741ddf2abd82aa58c86ae13e89736fdc3f08f
Name: kfp-launcher
Resources:
Limits:
Cpu: 500m
Memory: 128Mi
Requests:
Cpu: 100m
Volume Mounts:
Mount Path: /kfp-launcher
Name: kfp-launcher
Inputs:
Parameters:
Name: pod-spec-patch
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-container-impl
Outputs:
Pod Spec Patch: {{inputs.parameters.pod-spec-patch}}
Volumes:
Empty Dir:
Name: kfp-launcher
Dag:
Tasks:
Arguments:
Parameters:
Name: component
Value: {{workflow.annotations.pipelines.kubeflow.org/components-comp-preprocess-gpu-stage}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskInfo":{"name":"preprocess-gpu-stage"}}
Name: container
Value: {{workflow.annotations.pipelines.kubeflow.org/implementations-comp-preprocess-gpu-stage}}
Name: parent-dag-id
Value: {{inputs.parameters.parent-dag-id}}
Name: preprocess-gpu-stage-driver
Template: system-container-driver
Arguments:
Parameters:
Name: pod-spec-patch
Value: {{tasks.preprocess-gpu-stage-driver.outputs.parameters.pod-spec-patch}}
Default: false
Name: cached-decision
Value: {{tasks.preprocess-gpu-stage-driver.outputs.parameters.cached-decision}}
Depends: preprocess-gpu-stage-driver.Succeeded
Name: preprocess-gpu-stage
Template: system-container-executor
Arguments:
Parameters:
Name: component
Value: {{workflow.annotations.pipelines.kubeflow.org/components-comp-process-gpu-stage}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-process-gpu-stage"},"dependentTasks":["preprocess-gpu-stage"],"inputs":{"parameters":{"input_data":{"taskOutputParameter":{"outputParameterKey":"Output","producerTask":"preprocess-gpu-stage"}}}},"taskInfo":{"name":"process-gpu-stage"}}
Name: container
Value: {{workflow.annotations.pipelines.kubeflow.org/implementations-comp-process-gpu-stage}}
Name: parent-dag-id
Value: {{inputs.parameters.parent-dag-id}}
Depends: preprocess-gpu-stage.Succeeded
Name: process-gpu-stage-driver
Template: system-container-driver
Arguments:
Parameters:
Name: pod-spec-patch
Value: {{tasks.process-gpu-stage-driver.outputs.parameters.pod-spec-patch}}
Default: false
Name: cached-decision
Value: {{tasks.process-gpu-stage-driver.outputs.parameters.cached-decision}}
Depends: process-gpu-stage-driver.Succeeded
Name: process-gpu-stage
Template: system-container-executor
Inputs:
Parameters:
Name: parent-dag-id
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: root
Outputs:
Container:
Args:
--type
{{inputs.parameters.driver-type}}
--pipeline_name
gpu-dag-pipeline
--run_id
7f190382-7cfc-49da-b4e3-9d3b90580e71
--dag_execution_id
{{inputs.parameters.parent-dag-id}}
--component
{{inputs.parameters.component}}
--task
{{inputs.parameters.task}}
--runtime_config
{{inputs.parameters.runtime-config}}
--iteration_index
{{inputs.parameters.iteration-index}}
--execution_id_path
{{outputs.parameters.execution-id.path}}
--iteration_count_path
{{outputs.parameters.iteration-count.path}}
--condition_path
{{outputs.parameters.condition.path}}
Command:
driver
Image: gcr.io/ml-pipeline/kfp-driver@sha256:3c0665cd36aa87e4359a4c8b6271dcba5bdd817815cd0496ed12eb5dde5fd2ec
Name:
Resources:
Limits:
Cpu: 500m
Memory: 512Mi
Requests:
Cpu: 100m
Memory: 64Mi
Inputs:
Parameters:
Name: component
Default:
Name: runtime-config
Default:
Name: task
Default: 0
Name: parent-dag-id
Default: -1
Name: iteration-index
Default: DAG
Name: driver-type
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: system-dag-driver
Outputs:
Parameters:
Name: execution-id
Value From:
Path: /tmp/outputs/execution-id
Name: iteration-count
Value From:
Default: 0
Path: /tmp/outputs/iteration-count
Name: condition
Value From:
Default: true
Path: /tmp/outputs/condition
Dag:
Tasks:
Arguments:
Parameters:
Name: component
Value: {{workflow.annotations.pipelines.kubeflow.org/components-root}}
Name: runtime-config
Value: {}
Name: driver-type
Value: ROOT_DAG
Name: root-driver
Template: system-dag-driver
Arguments:
Parameters:
Name: parent-dag-id
Value: {{tasks.root-driver.outputs.parameters.execution-id}}
Name: condition
Value:
Depends: root-driver.Succeeded
Name: root
Template: root
Inputs:
Metadata:
Annotations:
sidecar.istio.io/inject: false
Name: entrypoint
Outputs:
Status:
Artifact GC Status:
Not Specified: true
Artifact Repository Ref:
Artifact Repository:
Archive Logs: true
s3:
Access Key Secret:
Key: accesskey
Name: mlpipeline-minio-artifact
Bucket: mlpipeline
Endpoint: minio-service.kubeflow:9000
Insecure: true
Key Format: artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}
Secret Key Secret:
Key: secretkey
Name: mlpipeline-minio-artifact
Default: true
Conditions:
Status: False
Type: PodRunning
Status: True
Type: Completed
Finished At: 2025-01-02T11:54:23Z
Nodes:
gpu-dag-pipeline-5nfmx:
Children:
gpu-dag-pipeline-5nfmx-2521779877
Display Name: gpu-dag-pipeline-5nfmx
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx
Name: gpu-dag-pipeline-5nfmx
Outbound Nodes:
gpu-dag-pipeline-5nfmx-411402406
Phase: Succeeded
Progress: 3/3
Resources Duration:
Cpu: 9
Memory: 6
Started At: 2025-01-02T11:53:52Z
Template Name: entrypoint
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-1052282141:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-2860418848
Display Name: preprocess-gpu-stage-driver
Finished At: 2025-01-02T11:54:06Z
Host Node Name: lke293878-490567-3333acbc0000
Id: gpu-dag-pipeline-5nfmx-1052282141
Inputs:
Parameters:
Name: component
Value: {"executorLabel":"exec-preprocess-gpu-stage","outputDefinitions":{"parameters":{"Output":{"parameterType":"STRING"}}}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskInfo":{"name":"preprocess-gpu-stage"}}
Name: container
Value: {"args":["--executor_input","{{$}}","--function_to_execute","preprocess_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)" ]; then\n python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.11.0' '--no-deps' 'typing-extensions\u003e=3.7.4,\u003c5; python_version\u003c"3.9"' \u0026\u0026 "$0" "$@"\n","sh","-ec","program_path=$(mktemp -d)\n\nprintf "%s" "$0" \u003e "$program_path/ephemeral_component.py"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"\n","\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef preprocess_gpu_stage() -\u003e str:\n """\n Simulates data preprocessing. Returns a string representing processed data.\n """\n print("Preprocessing data...")\n return "processed_data"\n\n"],"image":"python:3.10","resources":{"cpuLimit":1}}
Name: parent-dag-id
Value: 68
Default: -1
Name: iteration-index
Value: -1
Default:
Name: kubernetes-config
Value:
Name: gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage-driver
Outputs:
Artifacts:
Name: main-logs
s3:
Key: artifacts/gpu-dag-pipeline-5nfmx/2025/01/02/gpu-dag-pipeline-5nfmx-system-container-driver-1052282141/main.log
Exit Code: 0
Parameters:
Name: pod-spec-patch
Value:
Value From:
Default:
Path: /tmp/outputs/pod-spec-patch
Default: false
Name: cached-decision
Value: true
Value From:
Default: false
Path: /tmp/outputs/cached-decision
Name: condition
Value: nil
Value From:
Default: true
Path: /tmp/outputs/condition
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:02Z
Template Name: system-container-driver
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Pod
gpu-dag-pipeline-5nfmx-1363621287:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-411402406
Display Name: process-gpu-stage
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx-1363621287
Inputs:
Parameters:
Name: pod-spec-patch
Value:
Default: false
Name: cached-decision
Value: true
Name: gpu-dag-pipeline-5nfmx.root.process-gpu-stage
Outbound Nodes:
gpu-dag-pipeline-5nfmx-411402406
Phase: Succeeded
Started At: 2025-01-02T11:54:23Z
Template Name: system-container-executor
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-2521779877:
Boundary ID: gpu-dag-pipeline-5nfmx
Children:
gpu-dag-pipeline-5nfmx-3955520920
Display Name: root-driver
Finished At: 2025-01-02T11:53:56Z
Host Node Name: lke293878-490567-3333acbc0000
Id: gpu-dag-pipeline-5nfmx-2521779877
Inputs:
Parameters:
Name: component
Value: {"dag":{"tasks":{"preprocess-gpu-stage":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-preprocess-gpu-stage"},"taskInfo":{"name":"preprocess-gpu-stage"}},"process-gpu-stage":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-process-gpu-stage"},"dependentTasks":["preprocess-gpu-stage"],"inputs":{"parameters":{"input_data":{"taskOutputParameter":{"outputParameterKey":"Output","producerTask":"preprocess-gpu-stage"}}}},"taskInfo":{"name":"process-gpu-stage"}}}}}
Default:
Name: runtime-config
Value: {}
Default:
Name: task
Value:
Default: 0
Name: parent-dag-id
Value: 0
Default: -1
Name: iteration-index
Value: -1
Default: DAG
Name: driver-type
Value: ROOT_DAG
Name: gpu-dag-pipeline-5nfmx.root-driver
Outputs:
Artifacts:
Name: main-logs
s3:
Key: artifacts/gpu-dag-pipeline-5nfmx/2025/01/02/gpu-dag-pipeline-5nfmx-system-dag-driver-2521779877/main.log
Exit Code: 0
Parameters:
Name: execution-id
Value: 68
Value From:
Path: /tmp/outputs/execution-id
Name: iteration-count
Value: 0
Value From:
Default: 0
Path: /tmp/outputs/iteration-count
Name: condition
Value: nil
Value From:
Default: true
Path: /tmp/outputs/condition
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:53:52Z
Template Name: system-dag-driver
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Pod
gpu-dag-pipeline-5nfmx-2860418848:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-4250179659
Display Name: preprocess-gpu-stage
Finished At: 2025-01-02T11:54:13Z
Id: gpu-dag-pipeline-5nfmx-2860418848
Inputs:
Parameters:
Name: pod-spec-patch
Value:
Default: false
Name: cached-decision
Value: true
Name: gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage
Outbound Nodes:
gpu-dag-pipeline-5nfmx-4250179659
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:13Z
Template Name: system-container-executor
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-3955520920:
Boundary ID: gpu-dag-pipeline-5nfmx
Children:
gpu-dag-pipeline-5nfmx-1052282141
Display Name: root
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx-3955520920
Inputs:
Parameters:
Name: parent-dag-id
Value: 68
Name: gpu-dag-pipeline-5nfmx.root
Outbound Nodes:
gpu-dag-pipeline-5nfmx-411402406
Phase: Succeeded
Progress: 2/2
Resources Duration:
Cpu: 6
Memory: 4
Started At: 2025-01-02T11:54:02Z
Template Name: root
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: DAG
gpu-dag-pipeline-5nfmx-411402406:
Boundary ID: gpu-dag-pipeline-5nfmx-1363621287
Display Name: executor
Finished At: 2025-01-02T11:54:23Z
Id: gpu-dag-pipeline-5nfmx-411402406
Message: when 'true != true' evaluated false
Name: gpu-dag-pipeline-5nfmx.root.process-gpu-stage.executor
Phase: Skipped
Started At: 2025-01-02T11:54:23Z
Template Name: system-container-impl
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Skipped
gpu-dag-pipeline-5nfmx-4177297792:
Boundary ID: gpu-dag-pipeline-5nfmx-3955520920
Children:
gpu-dag-pipeline-5nfmx-1363621287
Display Name: process-gpu-stage-driver
Finished At: 2025-01-02T11:54:16Z
Host Node Name: lke293878-490567-3333acbc0000
Id: gpu-dag-pipeline-5nfmx-4177297792
Inputs:
Parameters:
Name: component
Value: {"executorLabel":"exec-process-gpu-stage","inputDefinitions":{"parameters":{"input_data":{"parameterType":"STRING"}}},"outputDefinitions":{"parameters":{"Output":{"parameterType":"STRING"}}}}
Name: task
Value: {"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-process-gpu-stage"},"dependentTasks":["preprocess-gpu-stage"],"inputs":{"parameters":{"input_data":{"taskOutputParameter":{"outputParameterKey":"Output","producerTask":"preprocess-gpu-stage"}}}},"taskInfo":{"name":"process-gpu-stage"}}
Name: container
Value: {"args":["--executor_input","{{$}}","--function_to_execute","process_gpu_stage"],"command":["sh","-c","\nif ! [ -x "$(command -v pip)" ]; then\n python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.11.0' '--no-deps' 'typing-extensions\u003e=3.7.4,\u003c5; python_version\u003c"3.9"' \u0026\u0026 "$0" "$@"\n","sh","-ec","program_path=$(mktemp -d)\n\nprintf "%s" "$0" \u003e "$program_path/ephemeral_component.py"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"\n","\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef process_gpu_stage(input_data: str) -\u003e str:\n """\n Simulates data processing. Consumes preprocessed data and returns the result.\n """\n print(f"Processing data: {input_data}")\n return f"result_from_{input_data}"\n\n"],"image":"python:3.10","resources":{"accelerator":{"count":"1","type":"nvidia.com/gpu"},"cpuLimit":1}}
Name: parent-dag-id
Value: 68
Default: -1
Name: iteration-index
Value: -1
Default:
Name: kubernetes-config
Value:
Name: gpu-dag-pipeline-5nfmx.root.process-gpu-stage-driver
Outputs:
Artifacts:
Name: main-logs
s3:
Key: artifacts/gpu-dag-pipeline-5nfmx/2025/01/02/gpu-dag-pipeline-5nfmx-system-container-driver-4177297792/main.log
Exit Code: 0
Parameters:
Name: pod-spec-patch
Value:
Value From:
Default:
Path: /tmp/outputs/pod-spec-patch
Default: false
Name: cached-decision
Value: true
Value From:
Default: false
Path: /tmp/outputs/cached-decision
Name: condition
Value: nil
Value From:
Default: true
Path: /tmp/outputs/condition
Phase: Succeeded
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:13Z
Template Name: system-container-driver
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Pod
gpu-dag-pipeline-5nfmx-4250179659:
Boundary ID: gpu-dag-pipeline-5nfmx-2860418848
Children:
gpu-dag-pipeline-5nfmx-4177297792
Display Name: executor
Finished At: 2025-01-02T11:54:13Z
Id: gpu-dag-pipeline-5nfmx-4250179659
Message: when 'true != true' evaluated false
Name: gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage.executor
Phase: Skipped
Progress: 1/1
Resources Duration:
Cpu: 3
Memory: 2
Started At: 2025-01-02T11:54:13Z
Template Name: system-container-impl
Template Scope: local/gpu-dag-pipeline-5nfmx
Type: Skipped
Phase: Succeeded
Progress: 3/3
Resources Duration:
Cpu: 9
Memory: 6
Started At: 2025-01-02T11:53:52Z
Events:
Type Reason Age From Message
Normal WorkflowRunning 110s workflow-controller Workflow Running
Normal WorkflowNodeRunning 109s workflow-controller Running node gpu-dag-pipeline-5nfmx
Normal WorkflowNodeRunning 99s workflow-controller Running node gpu-dag-pipeline-5nfmx.root-driver
Normal WorkflowNodeSucceeded 99s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root-driver
Normal WorkflowNodeRunning 99s workflow-controller Running node gpu-dag-pipeline-5nfmx.root
Normal WorkflowNodeRunning 89s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage-driver
Normal WorkflowNodeSucceeded 89s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage-driver
Normal WorkflowNodeRunning 89s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage
Normal WorkflowNodeSucceeded 89s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.preprocess-gpu-stage
Normal WorkflowSucceeded 79s workflow-controller Workflow completed
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root
Normal WorkflowNodeRunning 79s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.process-gpu-stage-driver
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.process-gpu-stage-driver
Normal WorkflowNodeRunning 79s workflow-controller Running node gpu-dag-pipeline-5nfmx.root.process-gpu-stage
Normal WorkflowNodeSucceeded 79s workflow-controller Succeeded node gpu-dag-pipeline-5nfmx.root.process-gpu-stage
`
If any of the component requirement cpu / gpu is not met pipeline is not running any of the components which is having enough resources. Is this expected behaviour?
Please find my sample source which I had used to simulate the scenario
`import kfp
from kfp.dsl import pipeline, component
Stage 1: Preprocessing Component
@component(base_image="python:3.10")
def preprocess_gpu_stage() -> str:
"""
Simulates data preprocessing. Returns a string representing processed data.
"""
print("Preprocessing data...")
return "processed_data"
Stage 2: Processing Component
@component(base_image="python:3.10")
def process_gpu_stage(input_data: str) -> str:
"""
Simulates data processing. Consumes preprocessed data and returns the result.
"""
print(f"Processing data: {input_data}")
return f"result_from_{input_data}"
Define the DAG pipeline
@pipeline(
name="GPU DAG Pipeline",
description="A sample pipeline with two stages using a DAG structure."
)
def gpu_dag_sample_pipeline():
# Stage 1: Preprocessing
preprocess_task = preprocess_gpu_stage().set_cpu_limit("1")
Compile the pipeline to a YAML file
if name == "main":
kfp.compiler.Compiler().compile(
pipeline_func=gpu_dag_sample_pipeline,
package_path="gpu_dag_sample_pipeline.yaml"
)
`
gpu_dag_sample_pipeline.yaml.zip
Please let me know if I was missing anything
Thanks
Beta Was this translation helpful? Give feedback.
All reactions