Question

Is It Nonsense To Have Sift/Polyphen2 Scores For Nonsense Variants ?

2

Entering edit mode

11.7 years ago

Raony Guimarães ★ 1.4k

Hello All,

I'm using annovar (which sometimes uses dbNSFP) for annotating variants against different databases, such as: SIFT, Polyphen2, Mutation Taster and etc

Let's say we identified a mutation already associated with a disease on the position: chr3:52183866 G A -> genotype 1/1 == AA

This NONSENSE variant was annotated by Annovar and predicted to have SIFT Score of 1.0 and Polyphen2 Score of 0.735458

grep "52183866" hg19_avsift.txt: 3 52183866 52183866 G A 1 R *

grep "52183866" hg19_ljb_pp2.txt: 3 52183866 52183866 G A 0.735458 NA

1) Why I can find this scores for nonsense mutations on the database files provided by annovar if I cannot search for nonsense mutations direct on the websites of SIFT and Polyphen2 ? How this was calculated ?

2) What is the difference between a nonsense variant that has Sift score of 0 and 1.0 ?

Ex:

3 52183866 52183866 G A 1 R *

1 10720360 10720360 T A 0 K *

3) Is there any algorithm I could use for Variant prioritization of nonsense variants ? I'm considering removing all SIFT and Polyphen2 Scores of nonsense variants from our internal database. So that when we filter our list of variants using the rule let's say "Sift <= 0.05" we don't exclude nonsense variants that have a sift score >= 0.05. I would like some suggestions for this problem.

Btw, I read this post Stop-gain mutation predicted as benign by SIFT/Polyphen2? before posting :) so I know I should not trust sift scores for nonsense variants. I also know sift scores are calculated for all the positions of the genome. I'm just trying to understand what's happening on this problem.

Thank you for reading! Best Regards.

exome variant annovar • 9.8k views

ADD COMMENT • link 11.7 years ago by Raony Guimarães ★ 1.4k

score 1 · Answer 1 · 2013-04-02

1

Entering edit mode

11.7 years ago

Ashutosh Pandey 12k

I think for human SIFT and Polyphen people have already predicted the scores for all the possible synonymous changes. this is what I think. I may not be correct. But you can download these bulk files from their sites. i think annovar uses these already processed files for the prediction.

ADD COMMENT • link 11.7 years ago by Ashutosh Pandey 12k

1

Entering edit mode

Thank you very much, I see this predictions for synonymous changes on SIFT and Polyphen2. What I'm asking is about the variants that are STOP_GAINED/NONSENSE and actually have SIFT and Polyphen2 scores on this files provided by annovar. How I should interpret this scores on this case?

ADD REPLY • link 11.7 years ago by Raony Guimarães ★ 1.4k

1

Entering edit mode

Sorry I didnt read your post carefully before. I read SIFT manual and found that SIFT doesnt predict any damaging score for non sense variants. so I am not sure how there are scores available for a substitution like R->*(stop codon). The only reason I can think of right now is that somehow SIFT missed (or the way it was used) that a particular variant was a non-sense one. As SIFT uses multiple alignment and conservation score to predict whether a mutation is deleterious it will predict a score for any non-synonymous mutation irrespective of whether that non-synonymous mutation could be further classified as non-sense. For your second question, if you totally ignore the fact that these are non-sense variants then the mutation with score 0 is more deleterious than the one with score 1 (Read SIFT manual, i think you already know about it). For third question, I dont think it will be a right thing to compare non-sense variant in one gene to non-sense variant in other gene. Variants within a gene can be compared with each other to guess which are the most deleterious ones. I have tried to answer your questions based on certain assumptions. You can email dbSFNP or annovar guys to know what exactly is going on?

Added later: My suggestion will be to ignore the SIFT scores for non-sense variants.

ADD REPLY • link 11.7 years ago by Ashutosh Pandey 12k

1

Entering edit mode

I was checking if I'm not asking something totally absurd :) Looking the file hg19_avsift.txt provided by annovar I see there are 4.008.788 nonsense variants with a SIFT score, so I believe there must be a good reason for this variants to be in there. I also know SIFT scores are predicted without considering if the mutation is nonsense or not. Does it make any sense to say a nonsense variant is more deleterious according to SIFT than the other? Sorry If I'm missing some important concept in here ... According to the author of annovar "a nonsense variant could be a missense variant, depending on which transcript you are using." I still have to think and interpret this answer better :)

ADD REPLY • link 11.7 years ago by Raony Guimarães ★ 1.4k

1

Entering edit mode

The last line makes sense to me. Sometimes a nonsynonymous variant may introduce premature stop codon in one of the transcripts of a gene and not in others. For eg. in case different transcripts have different reading frames. so in that case SIFT scores may make sense for the transcript that is not truncated by that mutation. The number you mentioned got formatted but I assume you are talking of a large number. Also, can you reframe the question "Does it make any sense to say a nonsense variant is more deleterious according to SIFT than the other?" What you mean by 'other' here? Some 'other' tool like polyphen or is it comparing two different nonsense variants in two different genes. if you mean the second one, i dont think you can really compare them. Both non-sense variants will truncate the mRNA of their respective genes and will be deleterious to them in most of the cases. You may compare them in two genes by their location. For example, the non-sense variant A in Gene AA is in the first exon and therefore mRNA will be degraded ultimately. But a non-sense variant B in Gene BB is in the end or last exon and not altering the protein structure much. But it has nth to do with the SIFT score. Sorry I may have been confusing you rather than helping you.

ADD REPLY • link 11.7 years ago by Ashutosh Pandey 12k

1

Entering edit mode

Yes it's a big number 4008788 lines of the file hg19_avsift.txt have a sift score predicted for different nonsense mutations. (Ex. 3 52183866 52183866 G A 1 R *) I'm talking about only comparing SIFT Scores of NONSENSE Variants, for variants in different genes using the same tool (can be SIFT or Polyphen2) for the same individual. I also believe I cannot use this scores on this case, I'm just wondering why they are present on this files from annovar. Supposing the same position could be a missense variant in another transcript I believe there should be actually another line on this file covering the missense mutation. Ex.

3 52183866 52183866 G A 1 R *

3 52183866 52183866 G A 0 R P

Yes there are more than one prediction (sift score) for the same position, simple because you can have different mutations on that position. This file actually have 79876631 (lines) of predictions for sift scores. So I don't believe this is the reason why there are so many nonsense variants with a SIFT Score on this file. (Actually 5% of the variants from this file are nonsense) I'm still trying to discover why they are there ...

ADD REPLY • link 11.7 years ago by Raony Guimarães ★ 1.4k