Skip to content

imgag/ngs-bits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ngs-bits - Short-read and long-read sequencing tools for diagnostics

Linux build status MacOS build status Windows build status
install with bioconda

Installation

Binaries of ngs-bits are available via Bioconda:

Alternatively, ngs-bits can be built from sources. Use git to clone the most recent release (the source code package of GitHub does not contains required sub-modules):

> git clone --recursive https://github.com/imgag/ngs-bits.git
> cd ngs-bits
> git checkout 2025_09
> git submodule update --recursive --init

Depending on your operating system, building instructions vary slightly:

  • Building from sources for Linux
  • Building from sources for MacOS
  • Building from sources for Windows

Support

Please report any issues or questions to the ngs-bits issue tracker.

Documentation

The documentation of individual tools is linked in the tools list below.
For some tools the documentation pages contain only the command-line help, for other tools they contain more information.

If you want to contribute, check the development documentation.

License

ngs-bits is provided under the MIT license, but is is based on other software components with different lincenses:

  • Qt is our base framwork for the graphical user interface, platform abstraction, data structures and much more.
  • htslib for HTS data format support (BAM, VCF, ...)
  • SimpleCrypt for weak encryption
  • QR-Code-generator for QR code generation

ChangeLog

Change log is available on the releases page.

Citing

You can cite ngs-bits in using Zenodo DOIs:

  • 2025_09: DOI
  • 2025_07: DOI

A list of all releases/DOIs can be found here.

Tools list

ngs-bits contains a lot of tools that are used for NGS-based diagnostics in our institute.

Some of the tools need the NGSD, a database that contains for example gene, transcript and exon data.
Installation instructions for the NGSD can be found here.

Main tools

  • SeqPurge - A highly-sensitive adapter trimmer for paired-end short-read data.
  • SampleSimilarity - Calculates pairwise sample similarity metrics from VCF/BAM files.
  • SampleGender - Determines sample gender based on a BAM file.
  • SampleAncestry - Estimates the ancestry of a sample based on variants.
  • CnvHunter - CNV detection from targeted resequencing data using non-matched control samples.
  • RohHunter - ROH detection based on a variant list annotated with AF values.
  • UpdHunter - UPD detection from trio variant data.

QC tools

The default output format of the quality control tools is qcML, an XML-based format for -omics quality control, that consists of an XML schema, which defined the overall structure of the format, and an ontology which defines the QC metrics that can be used.

  • ReadQC - Quality control tool for FASTQ files.
  • MappingQC - Quality control tool for a BAM file.
  • VariantQC - Quality control tool for a VCF file.
  • SomaticQC - Quality control tool for tumor-normal pairs (paper).
  • TrioMaternalContamination - Detects maternal contamination of a child using SNPs from parents.
  • TrioMendelianErrors - Determines mendelian error rate form a trio VCF file.
  • RnaQC - Calculates QC metrics for RNA samples.
  • QcToTsv - Converts qcML files to a TSV file.

BAM tools

  • BamClipOverlap - (Soft-)Clips paired-end reads that overlap.
  • BamDownsample - Downsamples a BAM file to the given percentage of reads.
  • BamExtract - Extract reads from BAM/CRAM by read name.
  • BamFilter - Filters a BAM file by multiple criteria.
  • BamHighCoverage - Determines high-coverage regions in a BAM file.
  • BamToFastq - Converts a coordinate-sorted BAM file to FASTQ files.
  • FastaFromBam - Download the reference genome FASTA file for a BAM/CRAM file.

BED tools

  • BedAdd - Merges regions from several BED files.
  • BedAnnotateFromBed - Annotates BED file regions with information from a second BED file.
  • BedAnnotateGC - Annnotates the regions in a BED file with GC content.
  • BedAnnotateGenes - Annotates BED file regions with gene names (needs NGSD).
  • BedChunk - Splits regions in a BED file to chunks of a desired size.
  • BedCoverage - Annotates the regions in a BED file with the average coverage in one or several BAM files.
  • BedExtend - Extends the regions in a BED file by n bases.
  • BedGeneOverlap - Calculates how much of each overlapping gene is covered (needs NGSD).
  • BedHighCoverage - Detects high-coverage regions from a BAM file.
  • BedInfo - Prints summary information about a BED file.
  • BedIntersect - Intersects two BED files.
  • BedLiftOver - Lift-over of regions in a BED file to a different genome build.
  • BedLowCoverage - Calcualtes regions of low coverage based on a input BED and BAM file.
  • BedMerge - Merges overlapping regions in a BED file.
  • BedReadCount - Annoates the regions in a BED file with the read count from a BAM file.
  • BedShrink - Shrinks the regions in a BED file by n bases.
  • BedSort - Sorts the regions in a BED file
  • BedSubtract - Subracts one BED file from another BED file.
  • BedToFasta - Converts BED file to a FASTA file (based on the reference genome).
  • CnvReferenceCohort - Create a reference cohort for CNV calling from a list of coverage profiles.

FASTQ tools

  • FastqAddBarcode - Adds sequences from separate FASTQ as barcodes to read IDs.
  • FastqConvert - Converts the quality scores from Illumina 1.5 offset to Sanger/Illumina 1.8 offset.
  • FastqConcat - Concatinates several FASTQ files into one output FASTQ file.
  • FastqDownsample - Downsamples paired-end FASTQ files.
  • FastqExtract - Extracts reads from a FASTQ file according to an ID list.
  • FastqExtractBarcode - Moves molecular barcodes of reads to a separate file.
  • FastqExtractUMI - Moves unique moleculare identifier from read sequence to read ID.
  • FastqFormat - Determines the quality score offset of a FASTQ file.
  • FastqList - Lists read IDs and base counts.
  • FastqMidParser - Counts the number of occurances of each MID/index/barcode in a FASTQ file.
  • FastqToFasta - Converts FASTQ to FASTA format.
  • FastqTrim - Trims start/end bases from the reads in a FASTQ file.

VCF tools (small variants)

  • VcfAdd - Merges several VCF files into one VCF by appending one to the other.
  • VcfAnnotateConsequence - Adds transcript-specific consequence predictions to a VCF file (similar to Ensembl VEP).
  • VcfAnnotateFromBed - Annotates the INFO column of a VCF with data from a BED file.
  • VcfAnnotateFromBigWig - Annotates the INFO column of a VCF with data from a BED file.
  • VcfAnnotateFromVcf - Annotates a VCF file with data from one or more source VCF files.
  • VcfAnnotateHexplorer - Annotates a VCF with Hexplorer and HBond scores.
  • VcfAnnotateMaxEntScan - Annotates a VCF file with MaxEntScan scores.
  • VcfBreakMulti - Breaks multi-allelic variants into several lines, making sure that allele-specific INFO/SAMPLE fields are still valid.
  • VcfCalculatePRS - Calculates the Polgenic Risk Score(s) for a sample.
  • VcfCheck - Checks a VCF file for errors.
  • VcfExtractSamples - Extract one or several samples from a VCF file. Can also be used to re-order sample columns.
  • VcfFilter - Filters a VCF based on the given criteria.
  • VcfLeftNormalize - Normalizes all variants and shifts indels to the left in a VCF file.
  • VcfReplaceSamples - Replaces sample identifiers in the VCF header.
  • VcfSort - Sorts variant lists according to chromosomal position.
  • VcfSplit - Splits a VCF into several chunks.
  • VcfStrip - Removes unwanted information from a VCF file
  • VcfStreamSort - Sorts entries of a VCF file according to genomic position using a stream.
  • VcfSubtract - Substracts the variants in a VCF from a second VCF.
  • VcfToBed - Converts a VCF file to a BED file.
  • VcfToBedpe - Converts a VCF file containing structural variants to BEDPE format.
  • VcfToTsv - Converts a VCF file to a tab-separated text file.

BEDPE tools (structural variants)

Gene handling tools

Phenotype handling tools

Misc tools

  • FastqFromBam - Download the reference genome FASTA file for a BAM/CRAM file.
  • FastaInfo - Basic info on a FASTA file containing DNA sequences.
  • FastaMask - Mask regions in a FASTA file with N bases.
  • HgvsToVcf - Transforms a TSV file with transcript ID and HGVS.c change into a VCF file (needs NGSD).

About

Short-read and long-read sequencing tools for diagnostics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 17