Skip to content

openproblems-bio/datasets

Repository files navigation

openproblems datasets

This repository contains dataset loaders and processing workflows.

Pipeline topology

%%| column: screen-inset-shaded
flowchart LR
  file_dataset(Dataset+Pca+Hvg)
  file_normalized(Normalized Dataset)
  file_pca(Dataset+Pca)
  file_raw(Raw Dataset)
  comp_dataset_loader[/Dataset Loader/]
  comp_normalization[/Normalization/]
  comp_processor_hvg[/Processor Hvg/]
  comp_processor_pca[/Processor Pca/]
  file_raw---comp_normalization
  file_pca---comp_processor_hvg
  file_normalized---comp_processor_pca
  comp_dataset_loader-->file_raw
  comp_normalization-->file_normalized
  comp_processor_hvg-->file_dataset
  comp_processor_pca-->file_pca
Loading

File format API

Dataset+Pca+Hvg

A normalised data with a PCA embedding and HVG selection

Used in:

Slots:

struct name type description
layers counts integer Raw counts
layers normalized double Normalised expression values
obs celltype string Cell type information
obs batch string Batch information
obs tissue string Tissue information
obs size_factors double The size factors created by the normalisation method, if any.
var hvg boolean Whether or not the feature is considered to be a ‘highly variable gene’
var hvg_score integer A ranking of the features by hvg.
obsm X_pca double The resulting PCA embedding.
varm pca_loadings double The PCA loadings matrix.
uns dataset_id string A unique identifier for the dataset
uns normalization_id string Which normalization was used
uns pca_variance double The PCA variance objects.

Example:

AnnData object
 obs: 'celltype', 'batch', 'tissue', 'size_factors'
 var: 'hvg', 'hvg_score'
 uns: 'dataset_id', 'normalization_id', 'pca_variance'
 obsm: 'X_pca'
 varm: 'pca_loadings'
 layers: 'counts', 'normalized'

Normalized Dataset

A normalized dataset

Used in:

Slots:

struct name type description
layers counts integer Raw counts
layers normalized double Normalised expression values
obs celltype string Cell type information
obs batch string Batch information
obs tissue string Tissue information
obs size_factors double The size factors created by the normalisation method, if any.
uns dataset_id string A unique identifier for the dataset
uns normalization_id string Which normalization was used

Example:

AnnData object
 obs: 'celltype', 'batch', 'tissue', 'size_factors'
 uns: 'dataset_id', 'normalization_id'
 layers: 'counts', 'normalized'

Dataset+Pca

A normalised data with a PCA embedding

Used in:

Slots:

struct name type description
layers counts integer Raw counts
layers normalized double Normalised expression values
obs celltype string Cell type information
obs batch string Batch information
obs tissue string Tissue information
obs size_factors double The size factors created by the normalisation method, if any.
obsm X_pca double The resulting PCA embedding.
varm pca_loadings double The PCA loadings matrix.
uns dataset_id string A unique identifier for the dataset
uns normalization_id string Which normalization was used
uns pca_variance double The PCA variance objects.

Example:

AnnData object
 obs: 'celltype', 'batch', 'tissue', 'size_factors'
 uns: 'dataset_id', 'normalization_id', 'pca_variance'
 obsm: 'X_pca'
 varm: 'pca_loadings'
 layers: 'counts', 'normalized'

Raw Dataset

An unprocessed dataset as output by a dataset loader.

Used in:

Slots:

struct name type description
layers counts integer Raw counts
obs celltype string Cell type information
obs batch string Batch information
obs tissue string Tissue information
uns dataset_id string A unique identifier for the dataset

Example:

AnnData object
 obs: 'celltype', 'batch', 'tissue'
 uns: 'dataset_id'
 layers: 'counts'

Component API

Dataset Loader

Arguments:

Name Type Direction Description
--output Raw Dataset output An unprocessed dataset as output by a dataset loader.

Normalization

Arguments:

Name Type Direction Description
--input Raw Dataset input An unprocessed dataset as output by a dataset loader.
--output Normalized Dataset output A normalized dataset
--layer_output string input The name of the layer in which to store the normalized data.
--obs_size_factors string input In which .obs slot to store the size factors (if any).

Processor Hvg

Arguments:

Name Type Direction Description
--input Dataset+Pca input A normalised data with a PCA embedding
--layer_input string input Which layer to use as input for the PCA.
--output Dataset+Pca+Hvg output A normalised data with a PCA embedding and HVG selection
--var_hvg string input In which .var slot to store whether a feature is considered to be hvg.
--var_hvg_score string input In which .var slot to store whether a ranking of the features by variance.
--num_features integer input The number of HVG to select

Processor Pca

Arguments:

Name Type Direction Description
--input Normalized Dataset input A normalized dataset
--layer_input string input Which layer to use as input for the PCA.
--output Dataset+Pca output A normalised data with a PCA embedding
--obsm_embedding string input In which .obsm slot to store the resulting embedding.
--varm_loadings string input In which .varm slot to store the resulting loadings matrix.
--uns_variance string input In which .uns slot to store the resulting variance objects.
--num_components integer input Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages