Hi,all~ Now I'm working on a bunch of RNA-seq data.Unfortunately this specie is kind of poorlyunderstood one,so very few info. I already extract diffexp gene IDs list,then use this list to get protein sequence and DNA seq(use annotation files from Ensembl).Now I wanna do some GO and kegg analysis. So far as i know ,Blast2go is a good choice. But should I use the protein seq or DNA seq, which one is better? or they are equal?(I found one transcipt ID can be a key to several protein seq, make the results more messed up ) Thanks for your attention! :)
Definitely go with protein sequences. For virtually all comparative genomic, phylogenetic, and sundry other analyses protein sequences are superior. Primarily for the reason listed above (multiple isoforms with different pathways, localizations, etc) but also, especially for potentially very divergent sequences, less chance for substitutional saturation. Distant homologs are better able to be detected in protein space compared to nucleotide. And, for anything using HMM profile type matching, those based on amino acids contain more information
Dan, very well written with cogent explanations +1. I just felt like the question warranted a different tone in response ;)
Thanks,both of U,perfect explanation! .