If I have genomic DNA sequence of an organism, what are the tools I can use to predict the proteins encoded by it?
If I have genomic DNA sequence of an organism, what are the tools I can use to predict the proteins encoded by it?
In general, the first task would be to predict genes, from which proteins are predicted in straightforward fashion by translation. I will give you couple of links that may be useful:
Predicting genes in prokaryotes is relatively straightforward and is usually done with high accuracy. This is because prokaryotic genomes have something called high gene density (most of their DNA is coding for genes) and their genes are continuous (in one piece). I suggest you try prodigal for prokaryotice gene prediction.
Eukaryotic genes are more difficult to predict, and that first link is mostly about eukaryotic gene prediction. This is because they have lower coding density and genes are discontinuous (there are introns that have to be cut out and exons to be spliced together). There are many programs for this purpose, and they are in general less successful in accurately predicting gene locations and structure.
For a very basic nucleotide to protein sequence translation tool you could try the Expasy Translate tool: https://web.expasy.org/translate/
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.