I'm looking for something that's more rigorous than BLAST and less rigorous than a global alignment-esque algorithm. Is there a paper out there that compare methods? What's the most popular method?
I'm doing a BLAST search against one protein, and then trying to find closer hits with a better search method. Is there a better way of going about this?
Edit: This question is probably too broad so here are some additional details. I'm interested in transmembrane proteins, so some type of analysis on transmembrane segments of the proteins will be important.
By "more rigorous than BLAST and less rigorous than a global alignment-esque algorithm" do you mean less rigorous than a dynamic programing algorithm (Smith-Waterman and Needleman-Wunsch)? Neither of these are rigorous enough to detect remote homology because they are pairwise alignments. When you do a search, these algorithms have absolutely no way of knowing which sequence positions are more important than others. Please see my MSA-based response bellow for an alternative to pairwise alignments.
I see that by "rigorous" you mean fewer false positives. Even a "global alignment-esque algorithm" would not be rigorous enough (and sensitive enough!) to detect remote homology well because those are all pairwise alignments. When you do a search, these algorithms have absolutely no way of knowing which sequence positions are more important than others. Please see my MSA-based response bellow for an alternative to pairwise alignments.
Btw, you cannot 'find homology', you can hypothesize homology of sequences based on sequence similarity which you detected using sequence similarity search.
For practical purposes, PSI-BLAST or HMMer searches are the tools of choice for finding (remote) homologs. If you know the domain, HMMer will do the trick. Most likely, the transmembrane elements are included in the HMM from SMART or PFAM.
There are many comparisons but you will need to define your task more precisely. Do you want to find homologs for an orphan protein or detect all members of a protein in a genome?
Does more rigorous mean fewer false-positives?
And you're not running BLAST against a database of one protein, are you?
The recent implementation of HMMer (3.0) is fast and reliable but if you have transmembrane regions, it pays to do a reverse BLAST/PSI-BLAST with candidate hits to confirm and weed out hits to regions with composition biases.
I'm looking for homologs for a group of proteins. I just re-read the HMMer doc, and it seems like it's exactly what I'm looking for! I'm not sure how I missed this profiling feature before. Rigorous does mean fewer false positives. I think my language was unclear above, I'm running blast using ncbi's non-redundant set. Thanks so much!
I am quite a new fellow here. As this question is related to what I do, I try to write what I know. I am convinced with bilouweb. If you have got the template structure and has very low sequence identity, you could make secondary structure prediction of the template using for example Jpred. It gives a number of aligned sequences that have similar secondary structure. On the other hand, you can do simple protein blast against nr database for your model sequence and then align those sequences. At last you can combine the two MSAs for generating pairwise alignment.
The other strategy, if you do not have template structure would be structure prediction tools of course. I had once used I-TASSER and it worked quite nicely. You will get much more information about your protein than just the structure.
The last option would be the prediction tools related to membrane protein for example split server.
ADD COMMENT
• link
updated 5.2 years ago by
Ram
44k
•
written 13.9 years ago by
Pals
★
1.3k
Some tools try to detect homology from the structure (secondary or tertiary) of proteins but there is not much transmembrane protein structures available.
I think the web server Phyre is a good tool to begin with. From a amino acid sequence, it gives:
- secondary structure predictions from 3 predictors
- disorder region predictions
- homologous proteins found
Protein fold recognition programs can help you because some are based on homolog research. You will find a good list of them on the CASP experiment web page
99% of the sequences I'm working with don't have structure data. I didn't consider using a structure based prediction before, I will definitely try this out. Thanks!
If you are interested in detection of transmembrane proteins the tool of choice might be TMHMM
Here's a web server for TMHMM. Also the SignalP program might help.
I like (and have voted up) the responses above, particularly the motif and MSA answers. If you want real quality to your MSA, make certain to keep your domains, even your transmembrane segments intact. In other words, an MSA that retains or is aligned to secondary and tertiary structural elements will be of higher quality and allow you to move forward with greater confidence.
By "more rigorous than BLAST and less rigorous than a global alignment-esque algorithm" do you mean less rigorous than a dynamic programing algorithm (Smith-Waterman and Needleman-Wunsch)? Neither of these are rigorous enough to detect remote homology because they are pairwise alignments. When you do a search, these algorithms have absolutely no way of knowing which sequence positions are more important than others. Please see my MSA-based response bellow for an alternative to pairwise alignments.
I see that by "rigorous" you mean fewer false positives. Even a "global alignment-esque algorithm" would not be rigorous enough (and sensitive enough!) to detect remote homology well because those are all pairwise alignments. When you do a search, these algorithms have absolutely no way of knowing which sequence positions are more important than others. Please see my MSA-based response bellow for an alternative to pairwise alignments.
Btw, you cannot 'find homology', you can hypothesize homology of sequences based on sequence similarity which you detected using sequence similarity search.