Qual score in VCF file
3
0
Entering edit mode
8.5 years ago

Hello, As I understand, QUAL is a representation of accuracy of genotyping. But what does a '.' represent under the QUAL column in a VCF file? I do not have any numeric value for Phred-scaled score for assertion of ALT allele in the entire column.

What does this mean for filtering low quality SNPs or genotypes?

Thank you.

EDIT: More information:

As I was looking as a filtered.recoded VCF file, I went back & checked the raw VCF file as well. This file had all the values for QUAL & INFO field. My service provider have responsed as 'The TASSEL-GBS pipeline does not calculate quality scores for any sites, but assigns an arbitrary, uniform value of 20 for each SNP in the VCF files. In my VCF files, and in all four cases there is only 1 QUAL score (20) for all SNPs which somehow appears a a '.' in the filtered recoded file. So, I should not use minQ for filtering SNPs, right?

Thank you.

vcf qual filtering • 16k views
ADD COMMENT
0
Entering edit mode

Just so we eliminate a possible glitch, are you sure the . is in the QUAL field? If you're looking at the file - just eyeballing it, it is highly possible the header may not align with the right field, and you may be seeing the . from the FILTER field. Maybe try counting values in that record or using awk or cut to view values?

ADD REPLY
2
Entering edit mode

Thanks for your reply Ram,but I am sure of looking at the QUAL column. An example

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1   10
ET_C6390828 41  S1_612208905    G   T   .   PASS    .;DP=119    GT:AD:DP:GQ:PL  ./.:0,0:0   ./.:0,0:0
ET_C6410100 69  S1_614033230    A   G   .   PASS    .;DP=2833   GT:AD:DP:GQ:PL  0/1:19,19:38:100:255,0,255  0/1:5,10:15:99:255,0,135

I am a biologist and still trying to learn bioinformatics. I am afraid, I may not be familiar with very technical terms.

ADD REPLY
0
Entering edit mode

You're right, it is in the QUAL field.

ADD REPLY
0
Entering edit mode

As I was looking as a filtered.recoded VCF file, I went back & checked the raw VCF file as well. This file had all the values for QUAL & INFO field. My service provider have responsed as 'The TASSEL-GBS pipeline does not calculate quality scores for any sites, but assigns an arbitrary, uniform value of 20 for each SNP in the VCF files. In my VCF files, and in all four cases there is only 1 QUAL score (20) for all SNPs which somehow appears a a '.' in the filtered recoded file. So, I should not use minQ for filtering SNPs, right? Thank you.

ADD REPLY
0
Entering edit mode

For questions like this, read the spec first. In VCF, "." at QUAL means a missing value – i.e. the QUAL is unknown.

ADD REPLY
0
Entering edit mode
8.5 years ago
Ram 44k

From this link,

The sites with ./. genotypes are no-call sites, [...]. A no-call site means there was not enough information to make a genotype call.

I'm not sure what it means when you have a genotype and a . for QUAL. You may have to do some digging, such as:

  1. Examine the VCF header and check for clues on the QUAL field as well as the parameters used to generate the VCF
  2. Go back to your pipeline/log file to check out the commands used and figure out what they mean in your scenario.
ADD COMMENT
0
Entering edit mode

Thank you for the link & your suggestions. As implied there, the first one is a no-call site because there is no QUAL and no genotype, which holds true for the first SNP. Still confused about the second though! The VCF header says nothing specific about qual:

##fileformat=VCFv4.0                    
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not known. The major allele was used as reference allele">                    
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">                    
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">                 
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">                  
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">                 
##FORMAT=<ID=PL,Number=.,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">                   
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">                  
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">                  
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">                   

I will keep trying.

Thanks again.

ADD REPLY
0
Entering edit mode

The header should have the command used to generate the VCF file. That should help.

ADD REPLY
0
Entering edit mode

It appears that your VCF is malformed. In addition to the missing QUAL scores (missing data are represented by '.'), the INFO field is missing some of the data specified by the header (e.g., NS and AF values). I recommend that you contact your service provider to obtain the correct VCF.

ADD REPLY
0
Entering edit mode
8.5 years ago

From this thread in the GATK forum:

"Unified Genotyper writes LowQual if the variant fails the calling threshold, but only writes a dot if it passes."

Edit: the OP is correct that this statement applies to the FILTER field. The complete text explains that 'PASS' in the FILTER field (as in the OP's example) indicates filtering after variant calling.

ADD COMMENT
0
Entering edit mode

Thanks for the thread. But in the example posted there, isn't the lowQual for specific for the 'Filter' column and not the 'Qual' column? In that case, if the variant will passes the filtering criteria/ threshold, the genotyper will insert a dot. If the variant fails the the filtering criteria/ threshold, the genotyper will insert a LowQual.

Though as explained in the GATK forum, in my case, I can see 'PASS' under the all the Filter columns as the VCF file was subsequently filtered for MAF and missing data per site by my the service provider.

ADD REPLY

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6