- Add -s/--somatic/--mosaic to output low AF somatic/mosaic variant
- Add -T/--trans-elem; output TE (transposable/mobile element, Alu/L1/SVA) information for INS/DEL
- Add INFO:TSD;REPNAME in VCF for TE INS/DEL
- Add --refine-aln: refine read alignment based on MSA in output SAM/BAM/CRAM
- Fix a SegFault in ONT mode regarding BAM/SA tag
- Add -Oz for compressed VCF output
- Add --exclude-ctg & --all-ctg; --autosome-XY is default now
- Fix lower case ref base
- Fix compiling in macOS-x64
# Download pre-built executables and test data (recommended)
# Linux-x64
wget https://github.com/yangao07/longcallD/releases/download/v0.0.5/longcallD-v0.0.5_x64-linux.tar.gz
tar -zxvf longcallD-v0.0.5_x64-linux.tar.gz && cd longcallD-v0.0.5_x64-linux
# MacOS-arm64
wget https://github.com/yangao07/longcallD/releases/download/v0.0.5/longcallD-v0.0.5_arm64-macos.tar.gz
tar -zxvf longcallD-v0.0.5_arm64-macos.tar.gz && cd longcallD-v0.0.5_arm64-macos
# PacBio HiFi reads
./longcallD call ./test_data/chr11_2M.fa ./test_data/HG002_chr11_hifi_test.bam --hifi > HG002_hifi_test.vcf
# Oxford Nanopore reads
./longcallD call ./test_data/chr11_2M.fa ./test_data/HG002_chr11_ont_test.bam --ont > HG002_ont_test.vcf
- Updates (pre-release v0.0.5)
- Getting Started
- Table of Contents
- Introduction
- Installation
- Usage
- Acknowledgements
- Contact
LongcallD is a local-haplotagging-based variant caller designed for detecting small variants and structural variants (SVs) using long-read sequencing data. It supports both PacBio HiFi and Oxford Nanopore reads.
LongcallD phases long reads into haplotypes using SNPs and small indels before calling SVs. It outputs phased variant calls in VCF format, including SNPs, small indels, and large SVs (currently only supporting insertions and deletions).
LongcallD (≥v0.0.5) can also call low-allele-frequency mosaic variant when -s/--mosaic
is used.
Currently, only SNVs and large indels are supported, no mosaic small indels will be called.
Specifically, longcallD can sensitively identify mosaic mobile element insertions (MEIs).
Providing the annotation sequence of common mobile elements, i.e., Alu/L1/SVA, using -T
is highly recommanded, which is included here.
Linux-x64
wget https://github.com/yangao07/longcallD/releases/download/v0.0.5/longcallD-v0.0.5_x64-linux.tar.gz
tar -zxvf longcallD-v0.0.5_x64-linux.tar.gz
MacOS-arm64
wget https://github.com/yangao07/longcallD/releases/download/v0.0.5/longcallD-v0.0.5_arm64-macos.tar.gz
tar -zxvf longcallD-v0.0.5_arm64-macos.tar.gz
Linux-arm64/macOS-x64
There is no pre-built executable for Linux-arm64 or macOS-x64, please try conda or build from source.
For Linux and macOS
conda install -c bioconda longcalld
To compile longcallD from source, ensure you have GCC/clang(9.0+) and zlib/libbz2/liblzma/libcurl (for htslib) installed. It is recommended to use the latest release.
wget https://github.com/yangao07/longcallD/releases/download/v0.0.5/longcallD-v0.0.5.tar.gz
tar -zxvf longcallD-v0.0.5.tar.gz
cd longcallD-v0.0.5; make
LongcallD requires a reference genome (FASTA) and a long-read BAM/CRAM file as inputs. It outputs phased variant calls in VCF format.
longcallD call -t16 ref.fa hifi.bam > hifi.vcf # default for PacBio HiFi reads (--hifi)
longcallD call -t16 ref.fa ont.bam --ont > ont.vcf # for ONT reads
With -s
, longcallD will detect both germline and somatic/mosaic variants.
For each somatic/mosaic variant, a SOMATIC
tag will be added to the INFO field in the output VCF.
longcallD call -s -t16 ref.fa hifi.bam > hifi.vcf
longcallD call -s -t16 ref.fa hifi.bam -T AluY_L1_SVA_cons_noPA.fa > hifi.vcf # add MEI information in INFO field
longcallD call -s -t16 ref.fa ont.bam --ont > ont.vcf
LongcallD supports region-based variant calling, similar to samtools view
.
longcallD call -t16 ref.fa hifi.bam chr11:10,229,956-10,256,221 > hifi_reg.vcf
longcallD call -t16 ref.fa hifi.bam chr11:10,229,956-10,256,221 chr12:10,576,356-10,583,438 > hifi_regs.vcf
longcallD call -t16 ref.fa hifi.bam --region-file reg.bed > hifi_regs.vcf
longcallD call -t16 ref.fa hifi.bam --autosome > hifi_autosome.vcf
longcallD call -t16 ref.fa hifi.bam --hifi -b hifi_phased.bam > hifi.vcf # output phased HiFi reads (BAM tag: HP & PS)
longcallD call -t16 ref.fa ont.bam --ont -b ont_phased.bam > ont.vcf # output phased ONT reads (BAM tag: HP & PS)
ref=https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GRCh38_GIABv3_no_alt_analysis_set_maskedGRC_decoys_MAP2K3_KMT2C_KCNJ18.fasta.gz
bam=https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_HiFi-Revio_20231031/HG002_PacBio-HiFi-Revio_20231031_48x_GRCh38-GIABv3.bam
longcallD call -t16 $ref $bam chr11:10,229,956-10,256,221 chr12:10,576,356-10,583,438 > hifi_regs.vcf
LongcallD is dependent on the following libraries, we are grateful to all the developers/maintainers:
- htslib: read/write BAM/CRAM/VCF
- abPOA: consensus calling
- WFA: pairwise alignment
- cgranges: interval operations
- sdust: identify low-complexity regions
For any questions or support, please contact:
-
Yan Gao [email protected]
-
Heng Li [email protected]