vcftools --freq output always zero
1
0
Entering edit mode
6.9 years ago
vinvan ▴ 50

Hello,

I'm trying to quantify the frequency of alleles at specific locations in a bam file generated by Samtools pileup using the command:

samtools mpileup -l positions.bed -uvf ref.fa -R in.bam | vcftools --vcf - --freq

The output of this returns a frequency of zero for each allele (even the reference allele) even though there are reads mapping at the locations indicated in the bed file. I thought maybe the removal of the DP4 field in the vcf files generated by the recent version of Samtools could be the reason for this but adding any of the output tags in the Samtools command did not resolve the issue. Also downgrading the versions of Samtools or vcftools did not work. My in.bam consists of many low coverage samples, so I thought that considering it as originating from one sample using the -R flag would help, but alas. Here an example of the vcf:

chr7    140453136       .       A       T,<*>   0       .       DP=28;I16=4,2,7,15,207,7209,762,26868,120,2400,440,8800,71,1307,367,8015;QS=0.216606,0.783394,0;VDB=0.254618;SGB=-0.692562;RPB=0.618774;MQB=1;MQSB=1;BQB=0.990252;MQ0F=0        PL      137,0,23,155,89,189
chr9    80409488        .       T       <*>     0       .       DP=18;I16=7,11,0,0,626,22228,0,0,326,6418,0,0,265,5029,0,0;QS=1,0;MQSB=0.983729;MQ0F=0  PL      0,54,167
chr10   89692904        .       C       <*>     0       .       DP=32;I16=6,26,0,0,1126,40100,0,0,640,12800,0,0,560,11438,0,0;QS=1,0;MQSB=1;MQ0F=0      PL      0,96,193
chr12   25398284        .       C       <*>     0       .       DP=117;I16=25,92,0,0,3989,139657,0,0,2340,46800,0,0,2233,50171,0,0;QS=1,0;MQSB=1;MQ0F=0 PL      0,255,248
chr14   105246551       .       C       <*>     0       .       DP=3;I16=0,3,0,0,108,3888,0,0,60,1200,0,0,47,875,0,0;QS=1,0;MQ0F=0      PL      0,9,49
chr15   66729162        .       C       <*>     0       .       DP=82;I16=45,37,0,0,2859,99993,0,0,1640,32800,0,0,1221,25177,0,0;QS=1,0;MQSB=1;MQ0F=0   PL      0,247,208
chr16   68772218        .       C       <*>     0       .       DP=4;I16=3,1,0,0,144,5184,0,0,80,1600,0,0,68,1444,0,0;QS=1,0;MQSB=1;MQ0F=0      PL      0,12,69
chr17   7578406 .       C       <*>     0       .       DP=80;I16=30,50,0,0,2834,100948,0,0,1600,32000,0,0,1556,33772,0,0;QS=1,0;MQSB=1;MQ0F=0  PL      0,241,224
chr17   37868208        .       C       <*>     0       .       DP=84;I16=33,51,0,0,2839,98471,0,0,1680,33600,0,0,1242,23780,0,0;QS=1,0;MQSB=1;MQ0F=0   PL      0,253,219
chr18   48591919        .       G       T,<*>   0       .       DP=123;I16=47,75,0,1,4189,146333,36,1296,2440,48800,20,400,2256,49520,25,625;QS=0.99177,0.00823045,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0      PL      0,255,230,255,233,231
chr19   1223125 .       C       <*>     0       .       DP=39;I16=16,23,0,0,1352,47800,0,0,780,15600,0,0,781,17635,0,0;QS=1,0;MQSB=1;MQ0F=0     PL      0,117,208
chr20   36031762        .       C       <*>     0       .       DP=19;I16=11,8,0,0,632,21880,0,0,380,7600,0,0,297,5873,0,0;QS=1,0;MQSB=1;MQ0F=0 PL      0,57,173
chr20   57484420        .       C       A,G,<*> 0       .       DP=2787;I16=1716,1063,3,5,94647,3.29852e+06,134,2668,55580,1.1116e+06,160,3200,45192,943972,136,2958;QS=0.997847,0.00189746,0.000255428,0;VDB=0.3669;SGB=-0.651104;RPB=0.718888;MQB=1;MQSB=1;BQB=0.000449926;MQ0F=0     PL      0,255,255,255,255,255,255,255,255,255

And the output of vcftools --freq:

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}
chr7    140453136       3       0       A:-nan  T:-nan  <*>:-nan
chr9    80409488        2       0       T:-nan  <*>:-nan
chr10   89692904        2       0       C:-nan  <*>:-nan
chr12   25398284        2       0       C:-nan  <*>:-nan
chr14   105246551       2       0       C:-nan  <*>:-nan
chr15   66729162        2       0       C:-nan  <*>:-nan
chr16   68772218        2       0       C:-nan  <*>:-nan
chr17   7578406 2       0       C:-nan  <*>:-nan
chr17   37868208        2       0       C:-nan  <*>:-nan
chr18   48591919        3       0       G:-nan  T:-nan  <*>:-nan
chr19   1223125 2       0       C:-nan  <*>:-nan
chr20   36031762        2       0       C:-nan  <*>:-nan
chr20   57484420        2       0       C:-nan  <*>:-nan

Samtools: 1.4.1 (same issue with newest version)

vcftools: 0.1.14

I'm pretty sure I'm missing something straightforward here, so thanks for your help!

vcftools allele • 2.3k views
ADD COMMENT
0
Entering edit mode
6.9 years ago

I believe that these 'asterisk' variants represent alleles that are assumed to be deleted - they were introduced relatively recently into the VCF format specification. I think that most are poor quality calls. As Asterisk is a wild character in so many contexts, the choice of character needs to be questioned (although I know that it's not an issue in Python at least).

When using samtools mpileup, I would recommend piping the mpileup into bcftools call, and then running a further step with varfilter.pl (provided with bcftools and runs as an executable and not via invocation from Perl) and then try the vcftools freq command after that.

For example:

samtools mpileup --redo-BAQ --min-BQ 30 --per-sample-mF --output-tags DP,AD -f ref.fa --BCF in.bam | bcftools call --multiallelic-caller --variants-only -Ob > Variants.bcf

bcftools view Variants.bcf | misc/vcfutils.pl varFilter -d 18 -w 1 -W 3 -a 1 -1 0.05 -2 0.05 -3 0.05 -4 0.05 -e 0.05 -p > Variants.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6