I've to infer a phylogenetic analysis on protein sequences. I've seen that some sequences are "conceptual translation". Do you suggest me to retrieve and use the genetic sequences rather than the proteic ones?
best regards
I've to infer a phylogenetic analysis on protein sequences. I've seen that some sequences are "conceptual translation". Do you suggest me to retrieve and use the genetic sequences rather than the proteic ones?
best regards
It doesn't matter where the sequence originated, if you have to do a protein phylogeny then do it with amino acid sequences. If, when you have done that, there isn't enough variability to resolve the proteins in the way you want (ie they are too similar) then you could repeat the process with nucleotide sequences as you will get much more variation.
In truth you might find that almost all the amino acid sequences you are dealing with are "conceptual translations" of DNA sequences. Very few sequences originate from protein sequencing, and lots come from DNA sequences that have been translated. This doesn't matter for your purposes.
Phylogenetic analysis using protein (20 alphabets) sequences gives better resolution than using corresponding gene sequence (DNA: 4 alphabets).
This article (linhttp://nar.oxfordjournals.org/content/31/13/3537.fulltio k text) may answer your doubt. Without doubt protein sequences have less signal to noise ratio as compared to DNA sequences and furthermore protein alignments also benefits from substitution matrices and moreover Phylogenetic links disappear more rapidly between DNA sequences (discrepancy) than between protein sequences.
Details about the best approach to build a phylogenetic tree and the advantages and disadvantages of using DNA and protein sequences can be read from, " Bioinformatics and functional genomics" Jonathan Pevsner.
Wish u luck :)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This isn't really right RaghuM. Amino acid sequence is better for some questions, worse for others. Taxa or genes that are very divergent may have too much homoplasy in the nucleotide sequence. Alternatively closely related stuff too little information in a.acids. It is very unlikely that many positions in a protein are free to vary between 20 amino acids.
Choosing the right algorithm, shouldn't DNA sequences give better results because they contain more information (4^3 bits at one aa position)? Also, parameters like omega = dN/dS can only be calculated when using DNA sequences.
which one to prefer: Its a context dependent whether DNA or Protein, I do agree for ka/ks and other calculations DNA will be better provided sequences are closely related. But for distantly related one Protein analysis will be better. but any how question was "protein phylogeny"