Question

Sift Or Polyphen

5

Entering edit mode

13.7 years ago

Dataminer ★ 2.8k

If one has to select only one server from SIFT and PolyPhen (both are used for identification of functional missense mutation). which one to go for?

Note: Rule is you can select only one and not both.

snp non mutation • 29k views

ADD COMMENT • link updated 5.0 years ago by always_learning ★ 1.1k • written 13.7 years ago by Dataminer ★ 2.8k

1

Entering edit mode

Just in case people are still using the old sift through its old host, here is the new and probably faster host - sift-dna.org/

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

0

Entering edit mode

Where can we download the Pre-built Polyphen Score? http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads

Does it seem that I have to process these data to generate the final file?

Regards, Najeeb

ADD REPLY • link 5.0 years ago by always_learning ★ 1.1k

0

Entering edit mode

Please do not add questions in existing threads and do not use the answer field for anything except answers.

ADD REPLY • link 5.0 years ago by ATpoint 85k

0

Entering edit mode

Where can we download the Pre-built Polyphen Score? http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads

Does it seem that I have to process these data to generate the final file?

Regards, Najeeb

ADD REPLY • link 5.0 years ago by always_learning ★ 1.1k

0

Entering edit mode

I already added this comment to your abuse of the answer field yesterday in a different thread:

Please do not add questions in existing threads and do not use the answer field for anything except answers.

Please stop doing that. You are free to comment on existing threads but questions should be posted in a new thread, showing the necessary effort and providing the necessary details.

ADD REPLY • link 5.0 years ago by ATpoint 85k

score 8 · Answer 1 · 2011-03-16

8

Entering edit mode

13.7 years ago

Programmer ▴ 110

A soon to be published Human Mutation Article suggests that Polyphen is less dependent on the multiple alignment used as input. If you are not able to produce your own alignments for your specific dataset then Polyphen could perhaps be preferred for this reason.

On the other hand, if you can produce your own alignments then SIFT might be preferable since its web UI lets you specify the alignment and with the correct alignment its results are at least comparable in accuracy to Polyphen.

ADD COMMENT • link 13.7 years ago by Programmer ▴ 110

0

Entering edit mode

In our experience of these tools the alignment has a huge effect on the ability to do prediction. Even down to the species used in the alignment. That HMA article is very similar to the work we did (unpublished). Interesting, thanks for the link.

ADD REPLY • link 13.7 years ago by User 59 13k

score 5 · Answer 2 · 2011-03-16

5

Entering edit mode

13.7 years ago

User 59 13k

Why don't you test them both on a dataset that you already know the results for and see which one gives you the closest match to what you expect?

You never know someone may even have done this before already and published it

People will use both SIFT and PolyPhen2, or both. If you use both you will get a certain subset of predictions that overlap, and each will give some unique to the algorithm.

ADD COMMENT • link 13.7 years ago by User 59 13k

3

Entering edit mode

We have unpublished results along similar lines to teh paper below as well. Interestingly these did suggest Polyphen-2 was "worse" than Polyphen-1, due to a wider variance in accuracy depending on the gene involved. This was seemingly due to "poor" alignments being generated for some genes in our datasets. It is quite hard to put together a fair benchmark, but our study suggested that Polyphen-2's best case accuracy was better than Polyphen-1's, but for our data the average-case accuracy and variance were worse. This was using default settings in the web UI.

ADD REPLY • link 13.7 years ago by Programmer ▴ 110

1

Entering edit mode

Would you care to qualify that with some evidence? "feeling" something is better is not enough ;)

ADD REPLY • link 13.7 years ago by User 59 13k

0

Entering edit mode

Well both SIFT and PolyPhen are complimentary in approach that is why they are used together. And getting an overlap of results from both servers is not always helpful. Some how I feel PolyPhen-I was better compared to the PolyPhen-II.

ADD REPLY • link 13.7 years ago by Dataminer ★ 2.8k

score 5 · Answer 3 · 2011-03-16

5

Entering edit mode

13.7 years ago

Jarretinha 3.4k

Honestly, both are misleading. Unless you must annotate SNPs with them, try another approach. Check out this article. Of course, this article is mostly about clinical applications. Nevertheless, I'm currently researching some (proto)oncogenes and I can say for sure that relying in alignments in order to look for phenotipic effects is too risky. This kinf of approach totally ignores correlations among positions. They are quite common and you see in this conservative analysis.

My advice is: just search the literature about the regions that you're studyin. Most high quality SNP data with phenotypic annotation isn't present on the popular public databases. E. g. The protein menin have about 150 variants annotated in UniProt when you can find about a 1000 of them (true protein level evidence) in the specialized literature.

SIFT and Polyphen2 can say opposite thing about the same SNP. How to decide who's right? So, work a bit more to suffer a lot less!!!

-- Edit --

There are other several drawbacks in both approaches. They don't correct for parology or or redundancy. This can give much higher weight to certain alignment positions than should be. But, we know that using alignment as a proxy of purifying selection only works well for low redundancy distant species sets as pointed here. The rate of sucess of SIFT/Polyphen2 is mainly due to evident constraints in protein structure. You don't need them to assess that. A 2ndry structure prediction program with a good profile guided alignment should return very similar results in a much more transparent way.

ADD COMMENT • link 13.6 years ago by Jarretinha 3.4k

2

Entering edit mode

+1 "SIFT and Polyphen2 can say opposite thing about the same SNP". Yes Jarretinha, you are extremely since they are complementary approach, we can't say which one is right or wrong. I have seen several such examples.

ADD REPLY • link 13.7 years ago by Khader Shameer 18k

1

Entering edit mode

Shameer, If you have to choose one of them which one you will choose and why?

ADD REPLY • link 13.7 years ago by Dataminer ★ 2.8k

0

Entering edit mode

can i ask why it has to be SIFT or polyphen as it seems very restrictive and a bit unrealistic. we could perhaps help more if we knew the constraints you were operating under. However you might be interested to know that recent studies have shown that there is no link between the effect of a SNP on protein stability and the deleteriousness of a SNP. I.e just because polyphen classes a non synonymous amino acid substitution as malignant there is no increased likelihood the SNP will be deleterious. bmc bioinformatics 2009(10) s9

ADD REPLY • link 13.7 years ago by User 6659 ▴ 980

0

Entering edit mode

I'm quite aware of these issues. We have chaperones and related, right? But, SNPs can have an effect (e. g thalassemia, falciform anaemia). It would be very nice to predict it within certain bounds at least. Any better clues?

ADD REPLY • link 13.7 years ago by Jarretinha 3.4k

0

Entering edit mode

sorry is that question at me or snpminer? Please can you rephrase it as if it is directed at me, I don't understand the question. Incidentally my comment was directed at the OP not at you. I thought he might be interested in the BMC paper. I'm familiar with you as a member of this forum and I didn't think you weren't aware of these issues

ADD REPLY • link 13.7 years ago by User 6659 ▴ 980

0

Entering edit mode

Also what are the SNP effects in thalassemia and falciform anaemia you refer to? I'm not aware of the impact of SNPs in these diseases. Are they non synonymous SNPs? Actually the thalassemia one is ringing a dim and distant bell now you mention it :)

ADD REPLY • link 13.7 years ago by User 6659 ▴ 980

0

Entering edit mode

Sorry for my latin! Falciform anaemia = sickle cell disease. As some thalassemias, it's caused (for sure) by a single non-synonymous substitution in the beta globin genes. It exists as a SNP with relatively high frequency in certain populations. Anyway, my comment still targets the question (mainly). And you citation suggests three more articles in BMC. Two by Ludwig people from Brasil! Nice!!!

ADD REPLY • link 13.7 years ago by Jarretinha 3.4k

score 4 · Answer 4 · 2011-04-05

Hi

It doesnt have to be either or. Condel integrates different outputs (like SIFT and Polyphen2). So just you can just run both and integrate them.

But sticking to your rule of using one server: From Ensembl 62 on you should be able to access directly to an integrated score of polyphen2 and sift (calculated by Condel) through their API (stated here).

UPDATE:

Now this option is already available at Ensembl: On the webserver you can access it directly or you can query teh API yourself.

score 3 · Answer 5 · 2011-04-25

They don't correct for parology or or redundancy.

This is not true, at least not in case of PolyPhen-2. It uses PSIC conservation score which is very robust and was specifically designed with highly redundant alignments in mind. PolyPhen-2 also has options to correct for paralogs (or use clean target database for true orthologs). Benchmarks show paralogs correction actually deteriorate accuracy slightly so this option is disabled by default.

The rate of success of SIFT/Polyphen2 in mainly due to evident constraints in protein structure.

SIFT does not use any secondary structure features for its predictions.

score 1 · Answer 6 · 2011-09-06

I'd be inclined to use the approach outlined in a recent paper entitled "The Predicted Impact of Coding Single Nucleotide Polymorphisms Database." The group used three computational tools—Grantham matrix, Polymorphism Phenotyping (PolyPhen), and Sorting Intolerant from Tolerant (SIFT) algorithms. Their Predicted Impact of Coding SNPs database is available at http://www.icr.ac.uk/cancgen/molgen/MolPopGen_PICS_database.htm and is an ongoing project that will continue to curate and release data on the putative functionality of coding SNPs.