Skip to content

new tool: galaxy wrappers for the pcdl physicell data lader command line commands #7034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions tools/pcdl/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: pcdl
owner: iuc
description: pcdl PhysiCell Data Loader.
long_description: Galaxy wrapper for the pcdl PhysiCell Data Loader command line commands for downstream analysis from PhysiCell output.
homepage_url: https://github.com/elmbeech/physicelldataloader
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/pcdl/
type: unrestricted
categories: [Systems Biology, Data Export, Graphics]

auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Galaxy wrapper for physicell dataloader function: {{ tool_name }}."

suite:
name: suite_pcdl
description: Galaxy wrapper suite for the pcdl PhysiCell Data Loader command line commands.
long_description: Galaxy wrapper suite for the pcdl PhysiCell Data Loader command line commands. Pcdl is paramount for downstream analysis from PhysiCell output. As such, the pcdl Galaxy tools are useful if you work with the interactive PhysiCell Studio Galaxy tool.
22 changes: 22 additions & 0 deletions tools/pcdl/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
pcdl
====

Galaxy wrapper for the pcdl PhysiCell Data Loader command line commands.
Pcdl is paramount for downstream analysis from PhysiCell output.
As such, the pcdl Galaxy tools are useful if you work with the interactive
PhysiCell Studio Galaxy tool.
+ https://usegalaxy.eu/?tool_id=interactive_tool_pcstudio&version=latest

You will have to unzip the PhysiCell output folder before you can run
pcdl Galaxy tools on it.
+ https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fimgteam%2Funzip%2Funzip%2F6.0%2Bgalaxy0&version=latest

More information about PhysiCell, PhysiCell Studio, and PhysiCell Data Loader
can be found here:
+ https://physicell.org/index.html
+ https://physicell-studio.readthedocs.io/en/latest/index.html
+ https://github.com/elmbeech/physicelldataloader

Date: 2025-06-06
License: BSD-3-Clause
Author: Elmar Bucher
216 changes: 216 additions & 0 deletions tools/pcdl/pcdl_get_anndata.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
<tool id="pcdl_get_anndata" name="pcdl_get_anndata" version="3.0.1+galaxy0" profile="21.05">
<requirements>
<requirement type="package" version="3.3.7">pcdl</requirement>
</requirements>

<stdio> <exit_code range=":" level="log" description="hello world!"/> </stdio>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello world is not a good description :)
But I guess this entire line can be removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @bgruening ,
thank you for reviewing!

unfortunately, this line cannot be removed.
if you write a python package with command line hocks, the python output is piped to sterr.
if i remove this line, galaxy thinks it always caught an error when the script runs successfully.
this is why i have to catch all exit_codes as log.
i agree, hello world! is not a good description.
i will change this to catch python script output.

would this be acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are looking for is https://docs.galaxyproject.org/en/latest/dev/schema.html#error-detection e.g. detect_errors="exit_code" on the command tag. See other Python based tools here in IUC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, exactly not.
As you can see, if I do so an error is detected, even if there is none.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that mean that your tool is reporting an error code of 1, is that the case?

For any other tool, text on stderr is not a problem, you can instruct Galaxy to only react on error codes, not text on stderr.

Copy link
Author

@elmbeech elmbeech Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it is not reporting an error code of 1.
as you can see, in this particular command (which in the pcdl command line version just output to the screen) , the bash line is catching the standard error with 2> and is writing this to version.txt.
and galaxy interpret this as error code 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i followed this instruction in the galaxy documentation:

<command><![CDATA[
#import re
mkdir output_pc &&
#for $file in $path:
#set $filename = re.sub('[^\w\-\.\s]', '_', str($file.element_identifier))
ln -s '$file' output_pc/$filename &&
#end for

pcdl_get_anndata 'output_pc' $focus
--custom_data_type $custom_data_type
--microenv $microenv
--graph $graph
--physiboss $physiboss
--settingxml '$settingxml'
--verbose $verbose
--drop $drop
--keep $keep
--scale maxabs
--collapse $collapse
]]></command>

<inputs>
<section name="positional_arguments" title="positional arguments:" expanded="true">
<param
name="path" label="data_collection"
type="data_collection" collection_type="list"
help="PhysiCell output data collection."
/>
<param
name="focus" label="values"
type="integer" value="1" optional="false"
help="minimal number of values a variable has to have in any of the mcds time steps to be outputted. variables that have only 1 state carry no information. None is a state too. default is 1."
/>
</section>

<section name="options" title="options:" expanded="true">
<param
name="custom_data_type" label="custom_data_type"
type="text" value="" optional="true"
help="parameter to specify custom_data variable types other than float (namely: int, bool, str) like this var:dtype myint:int mybool:bool mystr:str. downstream float and int will be handled as numeric, bool as Boolean, and str as categorical data. default is an empty string."
/>
<param
name="microenv" label="microenv"
type="boolean" truevalue="true" falsevalue="false" checked="true"
help="should the microenvironment be extracted and loaded into the anndata object? setting microenv to False will use less memory and speed up processing, similar to the original pyMCDS_cells.py script. default is True."
/>
<param
name="graph" label="graph"
type="boolean" truevalue="true" falsevalue="false" checked="true"
help="should neighbor graph, attach graph, and attached spring graph be extracted and loaded into the anndata object? default is True."
/>
<param
name="physiboss" label="physiboss"
type="boolean" truevalue="true" falsevalue="false" checked="true"
help="if found, should physiboss state data be extracted and loaded into the anndata object? default is True."
/>
<param
name="settingxml" label="settingxml"
type="text" value="PhysiCell_settings.xml" optional="true"
help="the settings.xml that is loaded, from which the cell type ID label mapping, is extracted, if this information is not found in the output xml file. set to None or False if the xml file is missing! default is PhysiCell_settings.xml."
/>
<param
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verbose parameter is a bit useless in the Galaxy context. You can set it to true without exposing this parameter to the user.

Copy link
Author

@elmbeech elmbeech Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, i would like to have it there for the user.
here are my arguments:

  1. my aim is, that the galaxy wrapper not differ dor the regular pcdl command line command.
  2. if you in the history, click the "info" (dataset details) you can see the difference between verbose and non-verbose in "Tool Standard Output". this might very well be interesting information for the power user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do capture the verbose output, that why I think just enable it by default if it contains useful information. This way all user have all informations. And then the need for this param in the UI is not needed anymore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but this might be a lot of output and uses up space that is not needed.
additionally, no-verbose processing is faster.
I would prefer to have it as an option with the default set to false.

name="verbose" label="verbose"
type="boolean" truevalue="true" falsevalue="false" checked="false"
help="setting verbose to True for more text output, while processing. default is False."
/>
<param
name="drop" label="drop"
type="text" value="" optional="true"
help="set of column labels to be dropped for the dataframe. don't worry: essential columns like ID, coordinates and time will never be dropped. Attention: when the keep parameter is given, then the drop parameter has to be an empty string! default is an empty string."
/>
<param
name="keep" label="keep"
type="text" value="" optional="true"
help="set of column labels to be kept in the dataframe. set values=1 to be sure that all variables are kept. don't worry: essential columns like ID, coordinates and time will always be kept. default is an empty string."
/>
<param
name="scale" label="scale"
type="select" display="radio"
help="specify how the data should be scaled. possible values are None, maxabs, minmax, std. None: no scaling. set scale to None if you would like to have raw data or entirely scale, transform, and normalize the data later. maxabs: maximum absolute value distance scaler will linearly map all values into a [-1, 1] interval. if the original data has no negative values, the result will be the same as with the minmax scaler (except with attributes with only one value). if the attribute has only zeros, the value will be set to 0. minmax: minimum maximum distance scaler will map all values linearly into a [0, 1] interval. if the attribute has only one value, the value will be set to 0. std: standard deviation scaler will result in sigmas. each attribute will be mean centered around 0. ddof delta degree of freedom is set to 1 because it is assumed that the values are samples out of the population and not the entire population. it is incomprehensible to me that the equivalent sklearn method has ddof set to 0. if the attribute has only one value, the value will be set to 0. default is maxabs."
>
<option value="none">none</option>
<option value="maxabs" selected="true">maxabs</option>
<option value="minmax">minmax</option>
<option value="std">std</option>
</param>
<param
name="collapse" label="collapse"
type="boolean" truevalue="true" falsevalue="false" checked="true"
help="should all mcds time steps from the time series be collapsed into one big anndata h5ad file, or a many h5ad, one h5ad for each time step?, default is True."
/>
</section>
</inputs>

<outputs>
<collection name="anndata_h5ad" type="list">
<discover_datasets pattern="(?P&lt;designation&gt;.+)\.h5ad" format="h5ad" directory="output_pc" visible="false" />
</collection>
</outputs>

<tests>
<test expect_num_outputs="1">
<section name="positional_arguments">
<param name="path" >
<collection type="list">
<element name="initial_attached_cells_graph.txt" value="initial_attached_cells_graph.txt" />
<element name="initial_cell_neighbor_graph.txt" value="initial_cell_neighbor_graph.txt" />
<element name="initial_cells.mat" value="initial_cells.mat" />
<element name="initial_mesh0.mat" value="initial_mesh0.mat" />
<element name="initial_microenvironment0.mat" value="initial_microenvironment0.mat" />
<element name="initial_spring_attached_cells_graph.txt" value="initial_spring_attached_cells_graph.txt" />
<element name="initial.svg" value="initial.svg" />
<element name="initial.xml" value="initial.xml" />
<element name="legend.svg" value="legend.svg" />
<element name="output00000000_attached_cells_graph.txt" value="output00000000_attached_cells_graph.txt" />
<element name="output00000000_cell_neighbor_graph.txt" value="output00000000_cell_neighbor_graph.txt" />
<element name="output00000000_cells.mat" value="output00000000_cells.mat" />
<element name="output00000000_microenvironment0.mat" value="output00000000_microenvironment0.mat" />
<element name="output00000000_spring_attached_cells_graph.txt" value="output00000000_spring_attached_cells_graph.txt" />
<element name="output00000000.xml" value="output00000000.xml" />
<element name="output00000001_attached_cells_graph.txt" value="output00000001_attached_cells_graph.txt" />
<element name="output00000001_cell_neighbor_graph.txt" value="output00000001_cell_neighbor_graph.txt" />
<element name="output00000001_cells.mat" value="output00000001_cells.mat" />
<element name="output00000001_microenvironment0.mat" value="output00000001_microenvironment0.mat" />
<element name="output00000001_spring_attached_cells_graph.txt" value="output00000001_spring_attached_cells_graph.txt" />
<element name="output00000001.xml" value="output00000001.xml" />
<element name="final_attached_cells_graph.txt" value="final_attached_cells_graph.txt" />
<element name="final_cell_neighbor_graph.txt" value="final_cell_neighbor_graph.txt" />
<element name="final_cells.mat" value="final_cells.mat" />
<element name="final_microenvironment0.mat" value="final_microenvironment0.mat" />
<element name="final_spring_attached_cells_graph.txt" value="final_spring_attached_cells_graph.txt" />
<element name="final.svg" value="final.svg" />
<element name="final.xml" value="final.xml" />
<element name="PhysiCell_settings.xml" value="PhysiCell_settings.xml" />
</collection>
</param>
</section>
<section name="options">
<param name="verbose" value="true" />
<param name="collapse" value="true" />
</section>
<output_collection name="anndata_h5ad" count="1">
<element name="timeseries_cell_maxabs" file="timeseries_cell_maxabs.h5ad" ftype="h5ad" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have some specific asserts that you can use for hdf5 files, I guess this makes your testing more reliable

https://docs.galaxyproject.org/en/latest/dev/schema.html#has-h5-attribute

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the link! i will look into it.

</output_collection>
</test>
<test expect_num_outputs="1">
<section name="positional_arguments">
<param name="path" >
<collection type="list">
<element name="initial_attached_cells_graph.txt" value="initial_attached_cells_graph.txt" />
<element name="initial_cell_neighbor_graph.txt" value="initial_cell_neighbor_graph.txt" />
<element name="initial_cells.mat" value="initial_cells.mat" />
<element name="initial_mesh0.mat" value="initial_mesh0.mat" />
<element name="initial_microenvironment0.mat" value="initial_microenvironment0.mat" />
<element name="initial_spring_attached_cells_graph.txt" value="initial_spring_attached_cells_graph.txt" />
<element name="initial.svg" value="initial.svg" />
<element name="initial.xml" value="initial.xml" />
<element name="legend.svg" value="legend.svg" />
<element name="output00000000_attached_cells_graph.txt" value="output00000000_attached_cells_graph.txt" />
<element name="output00000000_cell_neighbor_graph.txt" value="output00000000_cell_neighbor_graph.txt" />
<element name="output00000000_cells.mat" value="output00000000_cells.mat" />
<element name="output00000000_microenvironment0.mat" value="output00000000_microenvironment0.mat" />
<element name="output00000000_spring_attached_cells_graph.txt" value="output00000000_spring_attached_cells_graph.txt" />
<element name="output00000000.xml" value="output00000000.xml" />
<element name="output00000001_attached_cells_graph.txt" value="output00000001_attached_cells_graph.txt" />
<element name="output00000001_cell_neighbor_graph.txt" value="output00000001_cell_neighbor_graph.txt" />
<element name="output00000001_cells.mat" value="output00000001_cells.mat" />
<element name="output00000001_microenvironment0.mat" value="output00000001_microenvironment0.mat" />
<element name="output00000001_spring_attached_cells_graph.txt" value="output00000001_spring_attached_cells_graph.txt" />
<element name="output00000001.xml" value="output00000001.xml" />
<element name="final_attached_cells_graph.txt" value="final_attached_cells_graph.txt" />
<element name="final_cell_neighbor_graph.txt" value="final_cell_neighbor_graph.txt" />
<element name="final_cells.mat" value="final_cells.mat" />
<element name="final_microenvironment0.mat" value="final_microenvironment0.mat" />
<element name="final_spring_attached_cells_graph.txt" value="final_spring_attached_cells_graph.txt" />
<element name="final.svg" value="final.svg" />
<element name="final.xml" value="final.xml" />
<element name="PhysiCell_settings.xml" value="PhysiCell_settings.xml" />
</collection>
</param>
</section>
<section name="options">
<param name="verbose" value="false" />
<param name="collapse" value="false" />
</section>
<output_collection name="anndata_h5ad" count="2">
<element name="output00000000_cell_maxabs" file="output00000000_cell_maxabs.h5ad" ftype="h5ad" />
<element name="output00000001_cell_maxabs" file="output00000000_cell_maxabs.h5ad" ftype="h5ad" />
</output_collection>
</test>
</tests>

<help><![CDATA[
function to transform mcds time steps into one or many anndata objects for downstream analysis.

homepage: https://github.com/elmbeech/physicelldataloader
]]></help>

<citations>
<citation type="bibtex">
@misc{githubphysicelldataloader,
author = {Bucher, Elmar},
year = {2025},
title = {physicelldataloader},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/elmbeech/physicelldataloader},
}</citation>
</citations>
</tool>
Loading
Loading