Hi
I want to know how much your ortholog identification affects the phylogentic tree if you use cdna,cds or protein sequence to find the orhtolog using the BLAST.Which is the best sequences to start the ortholog identification
Hi
I want to know how much your ortholog identification affects the phylogentic tree if you use cdna,cds or protein sequence to find the orhtolog using the BLAST.Which is the best sequences to start the ortholog identification
If you work with protein sequences you can reach out to more distant orthology relationships.
In my experience, using HMMER's jackhmmer
tool to search for homology of a query protein against a set of target proteins is the approach that gives the most distant relations.
BLAST+ is a very good option in terms of speed/sensitivity if you proteomes are not extremely distant. OrthoMCL is a good option for simplicity of use.
If the two proteomes or sets of cdnas are close like human-chimp, it is important to be able to separate one-to-one from one-to-many from many-to-many orthologues, and for that using gene trees usually helps. There was a recent method published that is specially important for the analysis of gene trees for closely related species, called DLCoal:
http://www.ncbi.nlm.nih.gov/pubmed/22271778
Have a look at:
If you're interested in protein-coding genes and their orthologs, then use a protein sequence as your query in sequence similarity searches.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Orthology and paralogy are evolutionary concepts (defined by Fitch in 1970). Orthologous genes are homologous sequences that started to diverge through a speciation event. The same, paralogs orginate from duplication events. BLAST finds sequence similarities and based on this you can make statement whether two sequences are homologous or not. But it tells you nothing about orthology/paralogy.
Thanks for your reply ... suppose if i have two genome (human and chimp) cdna I make a blast database of that and set a blast cut off say 1e-10 and do the blast of that .now i do not select those genes from chimp cdna for which I have two or more cdna as a hit as they are very similar (e values are very near) and i put the criteria of length also . Then isn't it you are removing the paralogs from them ?
The strategy you are describing is similar in concept to defining inparalogs and using that definition to filter out one-to-many and many-to-many orthologs from your pair of species.
Thanks again ... do you think if in place of cdna if i take protein sequence i can reach better resolved phylogentic tree or the tree will be more or less same if my species are not so diverged. Definitely the databases provides a great resource for the analysis, but if you take this strategy how close you are to the correct phylogentic tree - regards
@victor: it should work for pair of species, but not always. have a look here (Fig.2) for comparison of methods.
and keep in mind, that evolution of many families is complex (including many duplications and losses), so assignment of orthology without phylogenetic reconstruction often leads to wrong assignments what is better for phylogenetic reconstruction, dna or protein, look here: http://biostar.stackexchange.com/questions/3739/protein-phylogenetic-analysis (the same link is in my post below)
Answering to to "viktor Mar 2 at 7:32": resolution is conditional to distance: at same species level we detect SNPs by basically aligning genomes, for different species at short genetic distances, genomic alignments at 5'UTR, cdna, even intronic levels better reflect the phylogeny of the species, at longer distances, cdna overlaps with protein similarity, then at even longer distances, protein similarity and conserved domains are better at resolving phylogenies than anything else. So it's a continuum from genome alignments to gene/cdna alignments to protein alignments to conserved domains. HIH