Is there a tool that looks for phosphorylation sites in multiple aligned sequences and scores them by conservation (e.g. by combining the normal scores for the same site in multiple species)?
Is there a tool that looks for phosphorylation sites in multiple aligned sequences and scores them by conservation (e.g. by combining the normal scores for the same site in multiple species)?
There are at least a dozen online tools that predict phosphorylation sites, using a variety of methods. Most of them simply scan a single input sequence for known motifs of specific kinase families, built using some fairly standard method (e.g. HMMs).
Surprisingly, there does not seem to be a tool to look for conserved sites in multiple sequences. However, people have used this approach: see for example Comparative Analysis Reveals Conserved Protein Phosphorylation Networks Implicated in Multiple Diseases.
I guess it would not be too difficult to build such a tool, using the alignment module of your favourite Bio* project to extract and score the appropriate columns from an alignment.
If you are interested in the conservation then take a look at Claudia Chica's server (http://conscore.embl.de/html/index.html). It is designed to give a score for protein motifs.
More information is available in the phospho.ELM server paper which has the reference to the actual implementation and some information on the usage of the conservation score in phospho.ELM.
Manuscript: http://nar.oxfordjournals.org/content/early/2010/11/08/nar.gkq1104.abstract
Webservice here: conscore.embl.de/CS.wsdl
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Very interesting! What is the required format for the multiple sequence alignment? I tried FASTA, but I got an error: "Query sequence (or query sequence name) not present in the alignment." (specifying 9606.ENSP00000255289 from this alignment: http://pastebin.com/Y6F7m2H6 )