Phred Score conversion
0
0
Entering edit mode
8.4 years ago
cge ▴ 20

Hello, in my VCF file I have QUAL scores like these:

CHROM  POS  QUAL

 1  12673   46.35

 1  12719   77.62

 1  12783   12856.7

 1  12882   204.57

 1  13110   889.88

 1  13116   21002.6

 1  13118   20715.5

 1  13273   5693.8

....

Anyone knows how they can be converted to phred score 0-255? Thank you S.

next-gen Phred • 3.1k views
ADD COMMENT
0
Entering edit mode

Can you give any additional information?

I don't really work with vcf files these days, but that looks pretty weird to me, and does not look like a vcf file. Is all that stuff on the same line? How long, exactly, is the line, and how many lines are there?

ADD REPLY
1
Entering edit mode

I assume there are three fields (CHROM, POS and QUAL). I'll edit the post a bit...

ADD REPLY
0
Entering edit mode

That looks much more likely. Still not a VCF file, though, or anything that (as far as I can tell) can be converted into anything usable. Not that I think VCF is a good format, but still.

ADD REPLY
0
Entering edit mode

That looks more like a read depth/coverage type information at a base. You would have to tell us about what program made that file.

ADD REPLY
0
Entering edit mode

Hi, I'm sorry for the confusing format. Thank you for editing. I just copied the first 3 columns of a vcf file (version 4.1), created with GATK, taking off all the header rows and the other columns. My question is: in the QUAL format should be a phred quality score, ranging from 0-255. In this file the values look like in a different range.

Thanks

ADD REPLY
0
Entering edit mode

That means the file in current state violates the VCF 4.1 spec. Did you use Excel to open and cut the columns out by any chance?

ADD REPLY
0
Entering edit mode

I'm sorry but I think this is not relevant. The values were the same looking at the vcf with "cat" or "more" command. I want to filter my dataset for a quality score, so I'm wondering how to interpret this score, that looks different from the Phred score 0-255. It is not the DP value, I obtained this using vcftools and it is different. Thanks Ignazio

ADD REPLY
0
Entering edit mode

Can you paste a few lines from the cat or more output instead of just those three columns? Use the "101010" button to format your output correctly. Also use ADD REPLY button to respond to existing posts. SUBMIT ANSWER should only be used for new answers for the original question.

ADD REPLY
0
Entering edit mode

Thank you for the advices. Btw, I found this thread on GATK forum, it looks like the QUAL score is phred scaled and sometimes can have very high values:

http://gatkforums.broadinstitute.org/gatk/discussion/2403/gatks-quality-scores-on-vcf-files

In my vcf there are not QUAL < 30, so I suppose variants with lower quality scores were already filtered out. S.

ADD REPLY
0
Entering edit mode

Interesting. Thank you for posting the link for the GATK thread.

ADD REPLY
0
Entering edit mode

The answers in that thread are completely ridiculous. 441,453 could be called a "phred-scaled value", but it's clearly wrong (either not phred-scaled or an incorrect answer). What would you have to do to have a 1/10^44,145 chance of error in any kind of measurement or prediction? Humans have never created anything capable of that kind of precision, and certainly not sequencing platforms. For reference, the chance that a random 100bp sequence of junk DNA contaminating the library perfectly matches the human genome but indicating that one SNP is 1/4^100 or about 1/10^66.

If those values are intended to be phred-scaled, there are some serious flaws in GATK's math or assumptions.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6