Can anyone recommend some methods/tools for predicting the subcellular localization of proteins. e.g. which of a group of genes is likely to be extracellular. Wikipedia has a nice article on the topic: protein subcellular localization
It even lists some tools. These predictors tend to be specialized for proteins in different organisms. Limited info on each tool is provided and several are quite old now. I am interested in any of the following:
A high level explanation of how these tools work or perhaps a citation for a review of informatic methods related to the problem.
Recommended tools for de novo prediction of protein subcellular localization in human and mouse. And why you would recommend that tool.
Databases where these predictions have already been determined for all proteins
Comments on whether it might be possible to identify cases where a single amino acid change (resulting from a mutation perhaps) modifies subcellular localization of the protein.
I believe all four points of my question have now been covered to varying degrees by some great responses. Neilfws reminded me to be more specific about which species I an studying so if anyone has some comments on tools/methods/databases that are particularly relevant to human and mouse ...
I believe all four points of my question have now been covered to varying degrees by some great responses. Neilfws reminded me to be more specific about which species I am studying so if anyone has comments on tools/methods/databases that are particularly relevant to human and mouse that would be appreciated ...
All answers were useful and provide distinct info. I'm giving the check mark to the one pointing to a Nature Protocols paper because it is a good place to start.
Locating proteins in the cell using TargetP, SignalP and related tools
Olof Emanuelsson1, Søren Brunak2, Gunnar von Heijne3 & Henrik Nielsen2
Abstract
Determining the subcellular localization of a protein is an important first step toward understanding its function. Here, we describe the properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts, and sketch a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties. We then outline how to use a number of internet-accessible tools to arrive at a reliable subcellular localization prediction for eukaryotic and prokaryotic proteins. In particular, we provide detailed step-by-step instructions for the coupled use of the amino-acid sequence-based predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at the Center for Biological Sequence Analysis, Technical University of Denmark. In addition, we describe and provide web references to other useful subcellular localization predictors. Finally, we discuss predictive performance measures in general and the performance of TargetP and SignalP in particular.
+1 - I have felt for a long time that many of the tools at the CBS from Denmark Technical University are good. One reason for their being good is the use of high-quality training sets.
There are lots of SCL prediction tools, but increasingly - there's also a lot of experimental data.
One of the best resources is the LOCATE database. It's a curated database with SCL information for human and mouse proteins derived from the literature, high-throughput immunofluorescence experiments and a computational pipeline for membrane protein annotation.
[?]Wolf-PSort[?] is the successor to the PSort software described in the paper that will try to predict localization sites.
Most methods you'll find for subcellular localization will depend on signal peptide prediction. Usually using a HMM method.
For my dataset, fortunately there was a previous study that had done a proteomics study on secreted proteins and had discovered a candidate list of proteins. What I found is that only around 20% of the predicted secreted proteins overlapped with the proteomics study.
This could have been just due to the limited taxonomy categories in the various software. There are only options for animal, plants, and fungi for wolfpsort. Perhaps if you made your own set of HMMs based on known mammal signal peptide sequences, you'll have better luck.
Thanks for another great reference and interesting comments from your own experiences. We have been using Wolf-PSort recently and it seems promising. Part of the reason for posting this question was to see if there are any popular alternatives.
Have you tired MultiLoc2 ? The paper claims that it has a better performance than Wolf-PSORT and other eukaryotic subcellular localization prediction tools. But its 2 years old now.
Though the signal peptides has highly conversed positive-hydrophobic-polar structure for a length of 25 AAs (eg. in the case of general signal peptide), the AA similarity is not so high. So in my guess a neutral point mutation at these signal peptide regions should not affect a signal peptide prediction here.
But, TAT signal peptide (which should have 'RR' in signal peptide sequence) and Lipoprotein signal peptide (which should have 'C' at signal peptide cleavage site) has highly conserved motif/AAs at certain positions, mutation at these positions will definitely affect the subcellular localization of the protein (but if it happens, its not TAT and Lipoprotein signal peptide anymore :) ).
Play with SignalP, TatP or LipoP tools (these are bacterial signal peptide prediction tools) and see how a 'mutation' affects the signal peptide predictions, which eventually determines the final subcellular localization.
Just to add one more point, if the protein's subcellular localization is determined by a signal peptide, mutation at other parts of the protein sequence should not have any effect on the sorting of the protein to its native subcellular localization.
Thanks. We will check out MultiLoc2. I also appreciate you interesting comments on the 4th part of my question: the potential effect of single amino acid changes on localization.
The first two answers posted for this question provide some good insight on the first two of my points above.
Are you looking for subcellular localization data for all proteomes out there or any specific organism or species etc in mind ?
Good point. I meant to specify this in the question. I'm particularly interested in human and mouse...
I believe all four points of my question have now been covered to varying degrees by some great responses. Neilfws reminded me to be more specific about which species I an studying so if anyone has some comments on tools/methods/databases that are particularly relevant to human and mouse ...
I believe all four points of my question have now been covered to varying degrees by some great responses. Neilfws reminded me to be more specific about which species I am studying so if anyone has comments on tools/methods/databases that are particularly relevant to human and mouse that would be appreciated ...
All answers were useful and provide distinct info. I'm giving the check mark to the one pointing to a Nature Protocols paper because it is a good place to start.