Helo,
In VCF file, there GT/PL folumn for genotype and its likelihood values. If 2 allele are possible (reference allele and alternative allele) the column value would be like below:
0/1:56:0:80
The 56 score is correspond to reference homozygous, 0 is to heterezygous, and 80 is to alternative homozygous.
My question is, if there are more than 2 allele (let's say 0 for reference and 1,2 for alternate allele), the score will consist of 6 score which is corresponds to:
- reference homozygous (0/0)
- alt 1 homozygous (1/1)
- alt 2 homozygous (2/2)
- ref and alt 1 heterzygous (0/1)
- ref and alt 2 heterozygous (1/2)
- alt 1 and alt 2 heterozygous (2/2)
My question is what is the order in the actual VCF file? I just don't know the order of the score and its corresponding meaning. Below is the actual example of 1 line in my vcf data.
1 226548932 . ACGGCGGCGGCGGCGGCGGCGGTGGCGGCGGCGG ACGGCGGCGGCGGTGGCGGCGGCGG,ACGGCGGCGGCGGCGGCGGTGGCGGCGGCGG 39.049 . INDEL;IDV=1;IMF=1;DP=9;VDB=0.0225004;SGB=-1.15236;MQSB=0.900802;MQ0F=0;ICB=0.153846;HOB=0.0555556;AC=1,1;AN=12;DP4=4,2,1,1;MQ=60 GT:PL ./.:0,0,0,0,0,0 0/0:0,3,60,3,60,60 0/0:0,3,60,3,60,60 ./.:0,0,0,0,0,0 0/1:60,3,0,60,3,60 0/0:0,3,60,3,60,60 0/0:0,3,60,3,60,60 0/2:50,56,132,0,81,78
Look at the GT/PL list below (I have 8 samples):
- Sample 1 : ./.:0,0,0,0,0,0
- Sample 2 : 0/0:0,3,60,3,60,60
- Sample 3 : 0/0:0,3,60,3,60,60
- Sample 4 : ./.:0,0,0,0,0,0
- Sample 5 : 0/1:60,3,0,60,3,60
- Sample 6 : 0/0:0,3,60,3,60,60
- Sample 7 : 0/0:0,3,60,3,60,60
- Sample 8 : 0/2:50,56,132,0,81,78
I add more interesting result:
- Sample 1: 1/1:26,12,9,26,12,26
- Sample 2: 0/1:0,3,5,3,5,5
- Sample 3: 1/1:26,12,9,26,12,26
- Sample 4: 1/2:45,45,45,6,6,0
- Sample 5: 1/1:20,3,0,20,3,20
- Sample 6: ./.:0,0,0,0,0,0
- Sample 7: ./.:0,0,0,0,0,0
- Sample 8: 1/1:26,12,9,26,12,26
So, if anyone knows how to interpret the score, please teach me and if it is possible, maybe you can explain the general consept. I treid reading the VCF documentation but it is not written there I think.