Hi all,
I have some proteins for which I want to do the function prediction. Which could be the best practices to do the prediction? I can use BLAST and other tools but I want better predictions?
Hi all,
I have some proteins for which I want to do the function prediction. Which could be the best practices to do the prediction? I can use BLAST and other tools but I want better predictions?
2.Domain search: Do domain search using Interproscan, Pfam or CDART
3.Search for siganl peptide and TM: search for signal peptide using signap and TM using TMHMM, phobius
4.Comparative modelling: Do homology modelling using swiss model, if your sequence less than 40% identity from blast result go for abintio modelling using I-Tasser
5.Gene ontology classifcation: You can search sequence for GO classifcation using blast2go or STRAP
6.Functional association prediction: Try searching sequence using STRING search
For further reading: Predicting protein function from sequence and structure -David Lee, Oliver Redfern & Christine Orengo.Nature Reviews Molecular Cell Biology 8, 995-1005 (December 2007) doi:10.1038/nrm2281
Give InterPro Scan a try. It's all-in-one solution that will classify your proteins into families, predict protein domains, annotate putative functions and GO terms.
You can install it locally if you need to annotate entire genome. More info here.
Please do not use that InterProScan link, it is a testing/development interface only. The public usage InterProScan interface is part of the InterPro website: http://www.ebi.ac.uk/interpro/
EMBL-EBI provide Web Services for InterProScan (REST or SOAP), which can be used if you do not have the compute resources to run InterProScan locally.
If you are working with Protein sequences, I think Pfam is a pretty commonly used tool to predict functional domains:
Blast2GO wins. I'm currently working on this problem. Workflow is really sleek in Blast2GO.
You can also try the powerful Profile HMM - Profile HMM search implemented in HHPred. This is especially useful if your sequences have low sequence similarity to known proteins. You can search against several precomputed HMM databases including most protein domain databases, PDB and even COGs.
Sequence similarity can only get you so far. If all you're after is molecular function e.g. enzymatic activity, sequence analysis is definitely where you should start. If by function you mean biological process the protein is involved in, then you will probably need more than sequence analysis. If you're lucky, your protein has a well characterized ortholog in an other organism so using good orthology resources (e.g. Treefam) will help. Otherwise, you can use a gene function prediction (a.k.a gene prioritization) tool such as funl or one of the many tools listed here. The basic idea is simple: given a query composed of some genes, rank the rest of the genome by some measure of functional similarity to the query. So you could use your protein as query and see what are the most similar (i.e. functionally related) genes. Many of these tools have been designed with disease gene prioritization in mind but some are suitable for measuring functional similarity, just make sure you understand what data they use and how.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Why would you do comparative modelling if you are only interested in the function? The homology search done before should be enough to realize what kind of protein you have and what function it might perform. Additionally, ab initio modelling is close to useless for proteins.
This is not hard fast rule to follow these steps, I think it always better get to structure-function relationship for our interested protein. Let say if we have hypothetical protein, structure prediction always helps in some way to know something about that protein.
I understand, just questioning the inclusion of homology modelling / ab initio modelling. Structure prediction only helps if it is reliable. Ab initio modelling is often not reliable, therefore, it will not help. Likely, it will bias the researcher in a very wrong way..