If one has to select only one server from SIFT and PolyPhen (both are used for identification of functional missense mutation). which one to go for?
Note: Rule is you can select only one and not both.
If one has to select only one server from SIFT and PolyPhen (both are used for identification of functional missense mutation). which one to go for?
Note: Rule is you can select only one and not both.
A soon to be published Human Mutation Article suggests that Polyphen is less dependent on the multiple alignment used as input. If you are not able to produce your own alignments for your specific dataset then Polyphen could perhaps be preferred for this reason.
On the other hand, if you can produce your own alignments then SIFT might be preferable since its web UI lets you specify the alignment and with the correct alignment its results are at least comparable in accuracy to Polyphen.
Why don't you test them both on a dataset that you already know the results for and see which one gives you the closest match to what you expect?
You never know someone may even have done this before already and published it
People will use both SIFT and PolyPhen2, or both. If you use both you will get a certain subset of predictions that overlap, and each will give some unique to the algorithm.
We have unpublished results along similar lines to teh paper below as well. Interestingly these did suggest Polyphen-2 was "worse" than Polyphen-1, due to a wider variance in accuracy depending on the gene involved. This was seemingly due to "poor" alignments being generated for some genes in our datasets. It is quite hard to put together a fair benchmark, but our study suggested that Polyphen-2's best case accuracy was better than Polyphen-1's, but for our data the average-case accuracy and variance were worse. This was using default settings in the web UI.
Honestly, both are misleading. Unless you must annotate SNPs with them, try another approach. Check out this article. Of course, this article is mostly about clinical applications. Nevertheless, I'm currently researching some (proto)oncogenes and I can say for sure that relying in alignments in order to look for phenotipic effects is too risky. This kinf of approach totally ignores correlations among positions. They are quite common and you see in this conservative analysis.
My advice is: just search the literature about the regions that you're studyin. Most high quality SNP data with phenotypic annotation isn't present on the popular public databases. E. g. The protein menin have about 150 variants annotated in UniProt when you can find about a 1000 of them (true protein level evidence) in the specialized literature.
SIFT and Polyphen2 can say opposite thing about the same SNP. How to decide who's right? So, work a bit more to suffer a lot less!!!
-- Edit --
There are other several drawbacks in both approaches. They don't correct for parology or or redundancy. This can give much higher weight to certain alignment positions than should be. But, we know that using alignment as a proxy of purifying selection only works well for low redundancy distant species sets as pointed here. The rate of sucess of SIFT/Polyphen2 is mainly due to evident constraints in protein structure. You don't need them to assess that. A 2ndry structure prediction program with a good profile guided alignment should return very similar results in a much more transparent way.
can i ask why it has to be SIFT or polyphen as it seems very restrictive and a bit unrealistic. we could perhaps help more if we knew the constraints you were operating under. However you might be interested to know that recent studies have shown that there is no link between the effect of a SNP on protein stability and the deleteriousness of a SNP. I.e just because polyphen classes a non synonymous amino acid substitution as malignant there is no increased likelihood the SNP will be deleterious. bmc bioinformatics 2009(10) s9
sorry is that question at me or snpminer? Please can you rephrase it as if it is directed at me, I don't understand the question. Incidentally my comment was directed at the OP not at you. I thought he might be interested in the BMC paper. I'm familiar with you as a member of this forum and I didn't think you weren't aware of these issues
Sorry for my latin! Falciform anaemia = sickle cell disease. As some thalassemias, it's caused (for sure) by a single non-synonymous substitution in the beta globin genes. It exists as a SNP with relatively high frequency in certain populations. Anyway, my comment still targets the question (mainly). And you citation suggests three more articles in BMC. Two by Ludwig people from Brasil! Nice!!!
Hi
It doesnt have to be either or. Condel integrates different outputs (like SIFT and Polyphen2). So just you can just run both and integrate them.
But sticking to your rule of using one server: From Ensembl 62 on you should be able to access directly to an integrated score of polyphen2 and sift (calculated by Condel) through their API (stated here).
UPDATE:
Now this option is already available at Ensembl: On the webserver you can access it directly or you can query teh API yourself.
They don't correct for parology or or redundancy.
This is not true, at least not in case of PolyPhen-2. It uses PSIC conservation score which is very robust and was specifically designed with highly redundant alignments in mind. PolyPhen-2 also has options to correct for paralogs (or use clean target database for true orthologs). Benchmarks show paralogs correction actually deteriorate accuracy slightly so this option is disabled by default.
The rate of success of SIFT/Polyphen2 in mainly due to evident constraints in protein structure.
SIFT does not use any secondary structure features for its predictions.
I'd be inclined to use the approach outlined in a recent paper entitled "The Predicted Impact of Coding Single Nucleotide Polymorphisms Database." The group used three computational tools—Grantham matrix, Polymorphism Phenotyping (PolyPhen), and Sorting Intolerant from Tolerant (SIFT) algorithms. Their Predicted Impact of Coding SNPs database is available at http://www.icr.ac.uk/cancgen/molgen/MolPopGen_PICS_database.htm and is an ongoing project that will continue to curate and release data on the putative functionality of coding SNPs.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Just in case people are still using the old sift through its old host, here is the new and probably faster host - sift-dna.org/
Where can we download the Pre-built Polyphen Score? http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads
Does it seem that I have to process these data to generate the final file?
Regards, Najeeb
Please do not add questions in existing threads and do not use the answer field for anything except answers.
Where can we download the Pre-built Polyphen Score? http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads
Does it seem that I have to process these data to generate the final file?
Regards, Najeeb
I already added this comment to your abuse of the answer field yesterday in a different thread:
Please stop doing that. You are free to comment on existing threads but questions should be posted in a new thread, showing the necessary effort and providing the necessary details.