Depmap pipeline reorg 25q3 #443

naquib314 · 2025-10-24T12:21:44Z

This PR works in tandem with the following draft PR in the depmap-deploy repo:

The run_pipelines_temp.sh is a temporary file to show how the shell script should look like in jenkins preprocessing.

The pipeline directory has been reorganized in the following way:

Before (master):

pipeline/
├── _run_common.conseq
├── cell_lines.conseq
├── celligner/
├── cn_gene/
├── context_explorer/
├── cor_analysis/
├── jenkins-run-pipeline.sh
├── jenkins-run-nonquarterly.sh
├── predictability/
├── scripts/
└── (etc.)

After (depmap-pipeline-reorg-25q3):

pipeline/
├── base_pipeline_runner.py           # Base class for all runners
├── pipeline_config.yaml               # Centralized configuration
├── image-name                         # Docker image reference
├── run_pipelines_temp.sh              # Temporary shell script to show how the jenkins shell script should look like
├── preprocessing-pipeline/            # All preprocessing logic
│   ├── preprocessing_pipeline_runner.py  # Runner implementation
│   ├── celligner/
│   ├── context_explorer/
│   ├── cor_analysis/
│   ├── predictability/
│   ├── scripts/
│   └── (all preprocessing conseq files)
├── data-prep-pipeline/                # Data preparation pipeline
│   ├── data_prep_pipeline_runner.py   # Runner implementation
│   ├── README.md                      # Documentation
│   ├── data_prep_pipeline/            # Conseq files
│   ├── scripts/                       #Data prep scripts
│   ├── poetry.lock
│   └── pyproject.toml
└── analysis-pipeline/                 
    ├── analysis_pipeline_runner.py    
    ├── predictability/
    └── publish.conseq

Introduces a --rows-per-model flag to make_fusions_matrix.py to optionally transpose the output matrix. Updates predictability.conseq to use this flag. Also updates publish.conseq messages for consistency and enables cngene_log_2_transformation in run_common.conseq. Adds a local_run.sh script for easier local execution.

Switched Docker image references to depmap-consortium registry across multiple pipeline files for consistency and updated GCP project settings. Removed unused lineage rules and publish logic from data-prep-pipeline, deleted obsolete jenkins-run-pipeline.sh, and added sparkles-config for Sparkles job configuration. Improved dstat_wrapper to handle additional terminal job states and made minor code cleanups.

README updated with revised setup and run instructions, including new dependency on depmap-deploy and changes to local execution steps. local_run.sh now supports 'internal' and 'external' environments with input validation. Removed unused transform_fusion.py script from predictability directory.

Refactored pipeline runners to load all hardcoded paths, credentials, Docker, and pipeline-specific settings from a new pipeline_config.yaml file. Added README_CONFIG.md to document the configuration structure and usage. Updated base, data prep, and preprocessing pipeline runners to use config values instead of hardcoded strings, improving maintainability and consistency. Improve Docker image name handling and fix symlink Refactored read_docker_image_name to handle missing or malformed image-name files more robustly, and improved handling of conseq_args in several methods. Replaced pipeline/image-name file with a symlink to ensure correct referencing. Update data_prep_pipeline_runner.py Update _run_common.conseq Move xrefs-external.template logic to xrefs-public.template Migrated all dataset artifact definitions from xrefs-external.template to xrefs-public.template, replacing 'virtual_dataset_id' with 'virtual_permaname' for consistency. Updated xrefs-external.template to include xrefs-public.template and removed redundant logic, streamlining the preprocessing pipeline configuration. Refactor pipeline runners for improved error handling and clarity Replaces try/except blocks with assertions for file existence and content checks, and simplifies error handling in base_pipeline_runner.py. Refactors dataset usage tracking in data_prep_pipeline_runner.py and preprocessing_pipeline_runner.py to remove redundant exception handling and improve clarity. Also ensures that log backup and post-run tasks are consistently handled, and adds additional assertions for robustness. Improve log formatting and add pipeline run script Replaces log separators with clearer lines in base and preprocessing pipeline runners for better readability. Adds run_pipelines_temp.sh script to automate cleanup, setup, and execution of data prep and preprocessing pipelines, including error handling and optional DB rebuild trigger. Rename depmap_data_taiga_id to release_taiga_id Replaces all references to 'depmap_data_taiga_id' with 'release_taiga_id' across pipeline and sample data scripts, including argument names, artifact types, and variable names. This change standardizes naming for clarity and consistency throughout the codebase.

Deleted the dose_replicate_reformat.conseq file, which contained the rule for reformatting standardized dose replicate data. This change likely reflects a refactor or removal of this processing step from the pipeline.

pipeline/analysis-pipeline/analysis_pipeline_local_run.sh

pipeline/data-prep-pipeline/README.md

pipeline/data-prep-pipeline/data_prep_pipeline/upload_to_taiga.py

pipeline/data-prep-pipeline/local_run.sh

pipeline/base_pipeline_runner.py

pipeline/data-prep-pipeline/data_prep_pipeline_runner.py

pipeline/preprocessing-pipeline/_run_common.conseq

pipeline/preprocessing-pipeline/celligner/celligner.conseq

pipeline/preprocessing-pipeline/dstat_wrapper.py

Updated local_run.sh to require an explicit 'internal' or 'external' parameter instead of defaulting to 'internal'. Improved error handling and updated README to reflect the new usage. Also added error raising in data_prep_pipeline_runner.py if release taiga ID is not found.

Updated the Celligner docker image SHA in celligner.conseq to use a newer version. Removed the unused dstat_wrapper.py script from the celligner directory.

Restrict supported Python version to >=3.9,<3.10 and update taigapy to version 4.1.0 in pyproject.toml. This ensures compatibility with the new taigapy release and clarifies the supported Python versions.

Replaces usage of release_taiga_id with RELEASE_PERMANAME from config throughout publish rules and upload_to_taiga.py. Simplifies upload_to_taiga.py by removing SHA256 checks and always updating the dataset. Updates preprocess_taiga_ids.py to export RELEASE_PERMANAME for downstream use and enables relevant includes in run_common.conseq.

Moved common argument parsing and config building logic to the base PipelineRunner class. Updated data prep and preprocessing pipeline runners to use these shared methods, reducing code duplication. Centralized dataset usage tracking logic in the base class and updated runners to call the new method with the appropriate pipeline directory.

naquib314 added 5 commits October 23, 2025 23:58

Reorganized pipeline

4e12da8

naquib314 requested a review from pgm October 24, 2025 12:21

naquib314 added 9 commits October 24, 2025 08:30

Remove dose replicate reformat pipeline rule

54b8b80

Deleted the dose_replicate_reformat.conseq file, which contained the rule for reformatting standardized dose replicate data. This change likely reflects a refactor or removal of this processing step from the pipeline.

Update preprocessing_pipeline_runner.py

81b32cc

Delete correlation.conseq

1410e1a

Delete nonquarterly-processed.conseq

37bd927

Update reformat_repurposing_data.conseq

f84bbda

Delete xrefs-nonquarterly-unprocessed.conseq

f1376e9

Delete filter_repurposing_data.py

291fa56

Delete reformat_dose_response_curve_params_repurposing_secondary.py

5286772

Delete process_taiga_pulled_aggregated_dose_artifact.py

b99fff8

pgm reviewed Oct 27, 2025

View reviewed changes

naquib314 added 6 commits October 28, 2025 13:14

Consolidate the run_via_container for pipelines

688ff5f

Update _run_common.conseq

250bfed

Update Celligner docker image and remove dstat_wrapper.py

fc32367

Updated the Celligner docker image SHA in celligner.conseq to use a newer version. Removed the unused dstat_wrapper.py script from the celligner directory.

Update Python version and taigapy dependency

e3e4eb9

Restrict supported Python version to >=3.9,<3.10 and update taigapy to version 4.1.0 in pyproject.toml. This ensures compatibility with the new taigapy release and clarifies the supported Python versions.

naquib314 requested a review from pgm October 28, 2025 21:16

naquib314 added 3 commits October 28, 2025 17:22

Update run_common.conseq

5b2fbd6

Update data_prep_pipeline_runner.py

a663bb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Depmap pipeline reorg 25q3 #443

Depmap pipeline reorg 25q3 #443

Uh oh!

naquib314 commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Depmap pipeline reorg 25q3 #443

Are you sure you want to change the base?

Depmap pipeline reorg 25q3 #443

Uh oh!

Conversation

naquib314 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

naquib314 commented Oct 24, 2025 •

edited

Loading