Entering edit mode
8.4 years ago
cge
▴
20
Hello, in my VCF file I have QUAL scores like these:
CHROM POS QUAL
1 12673 46.35
1 12719 77.62
1 12783 12856.7
1 12882 204.57
1 13110 889.88
1 13116 21002.6
1 13118 20715.5
1 13273 5693.8
....
Anyone knows how they can be converted to phred score 0-255? Thank you S.
Can you give any additional information?
I don't really work with vcf files these days, but that looks pretty weird to me, and does not look like a vcf file. Is all that stuff on the same line? How long, exactly, is the line, and how many lines are there?
I assume there are three fields (CHROM, POS and QUAL). I'll edit the post a bit...
That looks much more likely. Still not a VCF file, though, or anything that (as far as I can tell) can be converted into anything usable. Not that I think VCF is a good format, but still.
That looks more like a read depth/coverage type information at a base. You would have to tell us about what program made that file.
Hi, I'm sorry for the confusing format. Thank you for editing. I just copied the first 3 columns of a vcf file (version 4.1), created with GATK, taking off all the header rows and the other columns. My question is: in the QUAL format should be a phred quality score, ranging from 0-255. In this file the values look like in a different range.
Thanks
That means the file in current state violates the VCF 4.1 spec. Did you use Excel to open and cut the columns out by any chance?
I'm sorry but I think this is not relevant. The values were the same looking at the vcf with "cat" or "more" command. I want to filter my dataset for a quality score, so I'm wondering how to interpret this score, that looks different from the Phred score 0-255. It is not the DP value, I obtained this using vcftools and it is different. Thanks Ignazio
Can you paste a few lines from the
cat
ormore
output instead of just those three columns? Use the "101010" button to format your output correctly. Also useADD REPLY
button to respond to existing posts.SUBMIT ANSWER
should only be used for new answers for the original question.Thank you for the advices. Btw, I found this thread on GATK forum, it looks like the QUAL score is phred scaled and sometimes can have very high values:
http://gatkforums.broadinstitute.org/gatk/discussion/2403/gatks-quality-scores-on-vcf-files
In my vcf there are not QUAL < 30, so I suppose variants with lower quality scores were already filtered out. S.
Interesting. Thank you for posting the link for the GATK thread.
The answers in that thread are completely ridiculous. 441,453 could be called a "phred-scaled value", but it's clearly wrong (either not phred-scaled or an incorrect answer). What would you have to do to have a 1/10^44,145 chance of error in any kind of measurement or prediction? Humans have never created anything capable of that kind of precision, and certainly not sequencing platforms. For reference, the chance that a random 100bp sequence of junk DNA contaminating the library perfectly matches the human genome but indicating that one SNP is 1/4^100 or about 1/10^66.
If those values are intended to be phred-scaled, there are some serious flaws in GATK's math or assumptions.
More elaborate explanation: https://www.broadinstitute.org/gatk/guide/article?id=7258