Hi,
I call gvcf file using GATK HaplotypeCaller
as following:
gatk HaplotypeCaller -R my.fasta \
-I s-95.sort.noDup.bam \
-L 3R:23000000-27905053 \
-ERC GVCF \
-bamout test_s95.bamout.bam \
--native-pair-hmm-threads 28 \
-O test_s95.sort.noDup.g.vcf
The above ouput gvcf reports a variant at 3R:25063300
3R 25063300 . C T,<NON_REF> 804.64 . BaseQRankSum=-2.060;DP=59;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=212400,59;ReadPosRankSum=-1.269 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:37,22,0:59:99:0|1:25063284_G_T:812,0,1487,924,1554,2477:25063284:17,20,12,10
The DP=59
for this sites based on the vcf, which if I understand correctly means there are 59 reads covering this poosition.
But when I check the position in the input bam, I found the number of reads at this position is only 36.
samtools view s-95.sort.noDup.bam "3R:25063300-25063300" | wc -l
# 36
[I also count the reads manually using samtools tview
and confirmed 36 reads here]
I couldn't understand how could depth in vcf could outnumber the reads in the bam file!
===
I further checked the number of read at this position in test_s95.bamout.bam
and found 54 reads covering this position! Although it's still less than the depth(59) in VCF, they are much closer than that in the raw bam file(s-95.sort.noDup.bam
). It seems that the HaplotypeCaller
grabs some other reads and call variants, rather than just using the reads mapped at this position.
I am totally confused, and appreciate it if someone could offer some help!
Thanks!
Thanks,
--dont-use-soft-clipped-bases
solved the problem. It seemssamtools view/tview
doesn't show the soft clip base.If I don't use the soft clip base to call variants, the position
3R:25063300
is not a variant - all bases are reference. The soft clip base caused the position to be a polymorphic sites.Anyway, thanks for your solution!