- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1
Depmap pipeline reorg 25q3 #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            naquib314
  wants to merge
  23
  commits into
  master
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
depmap-pipeline-reorg-25q3
  
      
      
   
  
    
  
  
  
 
  
      
    base: master
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
                
     Open
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    Introduces a --rows-per-model flag to make_fusions_matrix.py to optionally transpose the output matrix. Updates predictability.conseq to use this flag. Also updates publish.conseq messages for consistency and enables cngene_log_2_transformation in run_common.conseq. Adds a local_run.sh script for easier local execution.
Switched Docker image references to depmap-consortium registry across multiple pipeline files for consistency and updated GCP project settings. Removed unused lineage rules and publish logic from data-prep-pipeline, deleted obsolete jenkins-run-pipeline.sh, and added sparkles-config for Sparkles job configuration. Improved dstat_wrapper to handle additional terminal job states and made minor code cleanups.
README updated with revised setup and run instructions, including new dependency on depmap-deploy and changes to local execution steps. local_run.sh now supports 'internal' and 'external' environments with input validation. Removed unused transform_fusion.py script from predictability directory.
Refactored pipeline runners to load all hardcoded paths, credentials, Docker, and pipeline-specific settings from a new pipeline_config.yaml file. Added README_CONFIG.md to document the configuration structure and usage. Updated base, data prep, and preprocessing pipeline runners to use config values instead of hardcoded strings, improving maintainability and consistency. Improve Docker image name handling and fix symlink Refactored read_docker_image_name to handle missing or malformed image-name files more robustly, and improved handling of conseq_args in several methods. Replaced pipeline/image-name file with a symlink to ensure correct referencing. Update data_prep_pipeline_runner.py Update _run_common.conseq Move xrefs-external.template logic to xrefs-public.template Migrated all dataset artifact definitions from xrefs-external.template to xrefs-public.template, replacing 'virtual_dataset_id' with 'virtual_permaname' for consistency. Updated xrefs-external.template to include xrefs-public.template and removed redundant logic, streamlining the preprocessing pipeline configuration. Refactor pipeline runners for improved error handling and clarity Replaces try/except blocks with assertions for file existence and content checks, and simplifies error handling in base_pipeline_runner.py. Refactors dataset usage tracking in data_prep_pipeline_runner.py and preprocessing_pipeline_runner.py to remove redundant exception handling and improve clarity. Also ensures that log backup and post-run tasks are consistently handled, and adds additional assertions for robustness. Improve log formatting and add pipeline run script Replaces log separators with clearer lines in base and preprocessing pipeline runners for better readability. Adds run_pipelines_temp.sh script to automate cleanup, setup, and execution of data prep and preprocessing pipelines, including error handling and optional DB rebuild trigger. Rename depmap_data_taiga_id to release_taiga_id Replaces all references to 'depmap_data_taiga_id' with 'release_taiga_id' across pipeline and sample data scripts, including argument names, artifact types, and variable names. This change standardizes naming for clarity and consistency throughout the codebase.
Deleted the dose_replicate_reformat.conseq file, which contained the rule for reformatting standardized dose replicate data. This change likely reflects a refactor or removal of this processing step from the pipeline.
              
                    pgm
  
              
              reviewed
              
                  
                    Oct 27, 2025 
                  
              
              
            
            
        
          
                pipeline/data-prep-pipeline/data_prep_pipeline/upload_to_taiga.py
              
                Outdated
          
            Show resolved
            Hide resolved
        
      Updated local_run.sh to require an explicit 'internal' or 'external' parameter instead of defaulting to 'internal'. Improved error handling and updated README to reflect the new usage. Also added error raising in data_prep_pipeline_runner.py if release taiga ID is not found.
Updated the Celligner docker image SHA in celligner.conseq to use a newer version. Removed the unused dstat_wrapper.py script from the celligner directory.
Restrict supported Python version to >=3.9,<3.10 and update taigapy to version 4.1.0 in pyproject.toml. This ensures compatibility with the new taigapy release and clarifies the supported Python versions.
Replaces usage of release_taiga_id with RELEASE_PERMANAME from config throughout publish rules and upload_to_taiga.py. Simplifies upload_to_taiga.py by removing SHA256 checks and always updating the dataset. Updates preprocess_taiga_ids.py to export RELEASE_PERMANAME for downstream use and enables relevant includes in run_common.conseq.
Moved common argument parsing and config building logic to the base PipelineRunner class. Updated data prep and preprocessing pipeline runners to use these shared methods, reducing code duplication. Centralized dataset usage tracking logic in the base class and updated runners to call the new method with the appropriate pipeline directory.
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This PR works in tandem with the following draft PR in the
depmap-deployrepo:The
run_pipelines_temp.shis a temporary file to show how the shell script should look like in jenkins preprocessing.The pipeline directory has been reorganized in the following way:
Before (master):
After (depmap-pipeline-reorg-25q3):