This repository contains dataset loaders and processing workflows.
%%| column: screen-inset-shaded
flowchart LR
file_dataset(Dataset+Pca+Hvg)
file_normalized(Normalized Dataset)
file_pca(Dataset+Pca)
file_raw(Raw Dataset)
comp_dataset_loader[/Dataset Loader/]
comp_normalization[/Normalization/]
comp_processor_hvg[/Processor Hvg/]
comp_processor_pca[/Processor Pca/]
file_raw---comp_normalization
file_pca---comp_processor_hvg
file_normalized---comp_processor_pca
comp_dataset_loader-->file_raw
comp_normalization-->file_normalized
comp_processor_hvg-->file_dataset
comp_processor_pca-->file_pca
A normalised data with a PCA embedding and HVG selection
Used in:
- processor hvg: output (as output)
Slots:
struct | name | type | description |
---|---|---|---|
layers | counts | integer | Raw counts |
layers | normalized | double | Normalised expression values |
obs | celltype | string | Cell type information |
obs | batch | string | Batch information |
obs | tissue | string | Tissue information |
obs | size_factors | double | The size factors created by the normalisation method, if any. |
var | hvg | boolean | Whether or not the feature is considered to be a ‘highly variable gene’ |
var | hvg_score | integer | A ranking of the features by hvg. |
obsm | X_pca | double | The resulting PCA embedding. |
varm | pca_loadings | double | The PCA loadings matrix. |
uns | dataset_id | string | A unique identifier for the dataset |
uns | normalization_id | string | Which normalization was used |
uns | pca_variance | double | The PCA variance objects. |
Example:
AnnData object
obs: 'celltype', 'batch', 'tissue', 'size_factors'
var: 'hvg', 'hvg_score'
uns: 'dataset_id', 'normalization_id', 'pca_variance'
obsm: 'X_pca'
varm: 'pca_loadings'
layers: 'counts', 'normalized'
A normalized dataset
Used in:
- normalization: output (as output)
- processor pca: input (as input)
Slots:
struct | name | type | description |
---|---|---|---|
layers | counts | integer | Raw counts |
layers | normalized | double | Normalised expression values |
obs | celltype | string | Cell type information |
obs | batch | string | Batch information |
obs | tissue | string | Tissue information |
obs | size_factors | double | The size factors created by the normalisation method, if any. |
uns | dataset_id | string | A unique identifier for the dataset |
uns | normalization_id | string | Which normalization was used |
Example:
AnnData object
obs: 'celltype', 'batch', 'tissue', 'size_factors'
uns: 'dataset_id', 'normalization_id'
layers: 'counts', 'normalized'
A normalised data with a PCA embedding
Used in:
- processor hvg: input (as input)
- processor pca: output (as output)
Slots:
struct | name | type | description |
---|---|---|---|
layers | counts | integer | Raw counts |
layers | normalized | double | Normalised expression values |
obs | celltype | string | Cell type information |
obs | batch | string | Batch information |
obs | tissue | string | Tissue information |
obs | size_factors | double | The size factors created by the normalisation method, if any. |
obsm | X_pca | double | The resulting PCA embedding. |
varm | pca_loadings | double | The PCA loadings matrix. |
uns | dataset_id | string | A unique identifier for the dataset |
uns | normalization_id | string | Which normalization was used |
uns | pca_variance | double | The PCA variance objects. |
Example:
AnnData object
obs: 'celltype', 'batch', 'tissue', 'size_factors'
uns: 'dataset_id', 'normalization_id', 'pca_variance'
obsm: 'X_pca'
varm: 'pca_loadings'
layers: 'counts', 'normalized'
An unprocessed dataset as output by a dataset loader.
Used in:
- dataset loader: output (as output)
- normalization: input (as input)
Slots:
struct | name | type | description |
---|---|---|---|
layers | counts | integer | Raw counts |
obs | celltype | string | Cell type information |
obs | batch | string | Batch information |
obs | tissue | string | Tissue information |
uns | dataset_id | string | A unique identifier for the dataset |
Example:
AnnData object
obs: 'celltype', 'batch', 'tissue'
uns: 'dataset_id'
layers: 'counts'
Arguments:
Name | Type | Direction | Description |
---|---|---|---|
--output |
Raw Dataset | output | An unprocessed dataset as output by a dataset loader. |
Arguments:
Name | Type | Direction | Description |
---|---|---|---|
--input |
Raw Dataset | input | An unprocessed dataset as output by a dataset loader. |
--output |
Normalized Dataset | output | A normalized dataset |
--layer_output |
string |
input | The name of the layer in which to store the normalized data. |
--obs_size_factors |
string |
input | In which .obs slot to store the size factors (if any). |
Arguments:
Name | Type | Direction | Description |
---|---|---|---|
--input |
Dataset+Pca | input | A normalised data with a PCA embedding |
--layer_input |
string |
input | Which layer to use as input for the PCA. |
--output |
Dataset+Pca+Hvg | output | A normalised data with a PCA embedding and HVG selection |
--var_hvg |
string |
input | In which .var slot to store whether a feature is considered to be hvg. |
--var_hvg_score |
string |
input | In which .var slot to store whether a ranking of the features by variance. |
--num_features |
integer |
input | The number of HVG to select |
Arguments:
Name | Type | Direction | Description |
---|---|---|---|
--input |
Normalized Dataset | input | A normalized dataset |
--layer_input |
string |
input | Which layer to use as input for the PCA. |
--output |
Dataset+Pca | output | A normalised data with a PCA embedding |
--obsm_embedding |
string |
input | In which .obsm slot to store the resulting embedding. |
--varm_loadings |
string |
input | In which .varm slot to store the resulting loadings matrix. |
--uns_variance |
string |
input | In which .uns slot to store the resulting variance objects. |
--num_components |
integer |
input | Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation. |