I want to determine the top 5 most deleterious variants in the protein coding region of my favorite gene but also want to understand how deleterious they are in the context of variants across the genome. :
I download the CADD scores for "All possible SNVs of GRCh38/hg38" and I just want to double check my understanding on a few items:
1.. Is the "PHRED" column in the above file calculated from the "RawScore" using (-10*log10(rank/total))
2.. Any variant with a score in the "PHRED" column of the above file > 10,20,30, and 40 would be in the top 10, 1, .1, and .01% of the most deleterious variants in the genome?
3.. I could use "PHRED" to rank the most deleterious variants in my gene (eg the 5 variants with the highest PHRED in my gene are the top 5 most deleterious in my gene)
Thanks