scylla

This pipeline performs classification and taxonomic read binning for metagenomic reads

For each sample provided, the following analyses are performed:

Input reads are QC filtered with fastp by length, quality, ambiguous bases and adaptors and polyX runs are trimmed (default settings)
Read length and quality stats are collected and single read files are checked for duplicate read names
Reads are classified using Kraken2 and the PlusPF database to get per-read taxon classifications
For species level taxa meeting the threshold percentage and counts (higher threshold for bacteria, lower for viruses etc), a file of binned taxon reads is extracted
The human read count in classifications is checked to make sure it is less than the threshold (1000). It is not possible to extract human-classified reads in any output.
Reads are screened for high consequence infectious diseases (HCID) using both classified counts and read mapping to a reference genome panel (including close confounders) and checking the proportion of the genome covered.
Reads are screened for the specified spike ins and a summary collected.
Separate files of the unclassified read fraction, the viral+unclassified read and all non-human reads are extracted.
A single sample report is generated

This pipeline can optionally perform a reclassification of the viral+unclassified read fraction with a custom viral database.

There is the option to perform de novo viral assembly and classification using virbot or genomad.

This pipeline can be run on a single fastq/pair of fastq files, or a directory of fastq files (which will be concatenated), or a demultiplexed directory of barcode subdirectories.

On personal computers we recommend running with the --local flag to ensure more reasonable resource requirements.

Example command

These examples use the --local flag to set resource requirements for a laptop. This will use a smaller kraken database, and therefore when working on a cluster we recommend you remove this flag.

Run in mSCAPE ingest mode (assumes input is a single sample, allows single/paired fastq file input).

nextflow run main.nf --fastq test/test_data/barcode01/barcode01.fq.gz -profile docker --local

or

nextflow run main.nf --fastq1 test/test_data/illumina/barcode02_R1.fq.gz --fastq2 test/test_data/illumina/barcode02_R2.fq.gz --paired -profile docker --local```

Run on a demultiplexed run directory (providing a run_dir). This must either contain a subdirectory per barcode, or pairs of files with names in the format *_R{1,2}*.f*q*.

nextflow run main.nf --run_dir test/test_data [--paired] -profile docker --local

Run a module independently of the main workflow. Supported modules are [preprocess, qc_checks, centrifuge_classification, kraken_classification, sourmash_classification, check_hcid_status, check_spike_status, extract_all, classify_novel_viruses].

nextflow run main.nf --module preprocess --fastq test/test_data/barcode01/barcode01.fq.gz -profile docker --local

Name		Name	Last commit message	Last commit date
Latest commit History 449 Commits
.github/workflows		.github/workflows
bin		bin
conf		conf
dockerfiles		dockerfiles
docs		docs
modules		modules
resources		resources
subworkflows		subworkflows
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scylla

Example command

About

Releases 12

Packages

Contributors 4

Languages

License

artic-network/scylla

Folders and files

Latest commit

History

Repository files navigation

scylla

Example command

About

Resources

License

Stars

Watchers

Forks

Releases 12

Packages 0

Contributors 4

Languages

Packages