vcftools does not filter by GQ
1
0
Entering edit mode
8.5 years ago
AP ▴ 100

Hello,

I am trying to filter based on GQ < 15. I do the following:

vcftools --vcf infile.vcf --minGQ 15 --recode --out filtered

However, this filtering does not work, nothing is being removed:

After filtering, kept 1287174 out of a possible 1287174 Site

I confirm that the GQ tag is present in my VCF file. Other filters such as min/maxDP or minQ work just fine. I am using VCFtools - v0.1.13

Any thoughts on this would be greatly appreciated.

Thanks!

p.s: This is a cross-post from SEQanswer where I did not receive any answers: http://seqanswers.com/forums/showthread.php?t=69468

vcftools GQ Filter • 4.6k views
ADD COMMENT
0
Entering edit mode

what's the definition of GQ in the VCF header ? show us a genotype and its' FORMAT please.

ADD REPLY
0
Entering edit mode

Thanks for your answer Pierre. In the VCF header, GQ stands for Genotype Quality. Here is a copy of the header containing the FORMAT fields:

##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

Here is an example of a genotype

GT:PL:DP:SP:GQ  1/1:83,33,0:11:0:40

FYI, the vcf file was generated this way

samtools mpileup -C 50 -E -t SP -t DP -u -I -f genome -b bam_list.txt > out.bcf
bcftools call -v -c -f gq out.bcf > out.vcf
ADD REPLY
1
Entering edit mode
8.5 years ago
AP ▴ 100

Here is an explanation:

GT is just replaced by ./. when GQ is below the threshold. I thought the genotype would simply be completely removed. That is why there is the same number of lines left between none-filtered and filtered files and that GQ information can still be seen, even after filtering.

This is hard to tell though. On the current manual, it says for —minGQ "Exclude all genotypes with a quality below the threshold specified. This option requires that the "GQ" FORMAT tag is specified for all sites”. It doesn’t really say if data is removed or not (like most filtering do).

An older manual version states: "These options are used to exclude genotypes from any analysis being performed by the program. If excluded, these values will be treated as missing. ... Exclude all genotypes with a quality below the threshold specified. This option requires that the "GQ" FORMAT tag is specified for all sites."

So all sites with GQ below the threshold changes the genotype to "./.", without actually removing/filtering out any lines.

ADD COMMENT
0
Entering edit mode

Thank you for explaining this AP. I was troubled by the same situation.

ADD REPLY

Login before adding your answer.

Traffic: 1693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6