What is the 'standard' protocol for gene prediction in a novel genome nowadays?
And, is it meaningful to compare the number of genes predicted with different parameters, softwares in similarly sized genomes?
What is the 'standard' protocol for gene prediction in a novel genome nowadays?
And, is it meaningful to compare the number of genes predicted with different parameters, softwares in similarly sized genomes?
As far as I know, there is no single standard protocol, but it depends on the amount of effort you want to put in and the amount of evidence you have for the prediction. If you have no evidence at all (proteins, RNA-seq, ESTs), then you have to go for ab initio gene prediction with tools like Augustus. If you have evidence, then it ranges from very automated and easy to use methods like MAKER to methods that are part of a bigger pipeline that gives you the whole hog like Ensembl Gene Sets. There are many others in between that can be listed as answers.
Gene prediction is usually an organism-specific problem. Perhaps more info about which organism you are working on would help people to give more constructive comments. I mainly work on bacterial genomes and like Prodigal by ORNL (http://prodigal.ornl.gov/).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can avilella or anyone elaborate on this and provide a systematic guide or resource on this?