Skip to content

Test cases

Thomas Cokelaer edited this page Jul 12, 2023 · 5 revisions

The summary.html entry point leads you to a MultiQC report. This report contains many images that are sometimes difficult to interpret.

Presence of ribosomal RNA

DGE on several GFF features

You will need to build your own GFF file using sequana.gff3 module

from sequana.gff3 import GFF3
g = GFF3(YOURFILE)
g.save_gff_filtered(features=['gene', 'mRNA'])

Some cells are empty in the CSV files. Why ?

Outliers are defined by the Cook's distance, large values indicate an outlier count.

Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:

  • If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
  • If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance.
  • If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

Outlier detected by the cook's distance is based on a percentile-based threshold: You can choose a threshold based on a certain percentile of the cook's distance distribution, however, by default in sequana/rnadiff we let DeSeq2 decide and it seems to be based on the 95th percentile meaning that all genes with cook value above the 95th have their pvalue/padj set to NA

Clone this wiki locally