-
Notifications
You must be signed in to change notification settings - Fork 258
Description
I'm using mpileup to compute the mutation frequencies of a list of known variant sites in RNA-seq. For a small number of indels the AD numbers in the output are much lower than what I'm seeing in IGV. For instance, when I run the following command:
bcftools mpileup -r 11:69587265 -f <GRCm38 fasta> --annotate FORMAT/AD,FORMAT/DP,INFO/AD -F 0.001 --max-depth 10000 --max-idepth 10000 -Q20 -x -A --no-BAQ --tandem-qual 10000 <RNA-seq bam> | grep -v "^#"
Here is the output I'm getting:
11 69587265 . CA CAA 0 . INDEL;IDV=77;IMF=0.616;DP=124;AD=47,2;I16=24,23,1,1,1880,75200,80,3200,872,17236,40,800,1057,25169,46,1066;QS=0.908467,0.0915332;VDB=0.56;SGB=-0.453602;RPBZ=-0.571191;MQBZ=2.56432;MQSBZ=0;BQBZ=0;SCBZ=-1.87886;MQ0F=0 PL:DP:AD 0,106,55:49:47,2
And here is how this site looks like in IGV:

I am aware that some low quality reads are filtered during variant calling and the reported AD number will be lower than the raw IDV. However upon manual examination most of the reads seem to be high-quality and correctly aligned. Such a huge drop from 77 to 2 seems counter-intuitive to me and I wonder if this is the expected behavior of mpileup.
(One thing I notice is that most of the read pairs containing the insertion are spliced except 2, which happens to be the same as the AD reported)