Hi,
What is the best bioinformatics approach to find missing genes and their CDS coordinates in annotated draft genome,
Thanks
Hi,
What is the best bioinformatics approach to find missing genes and their CDS coordinates in annotated draft genome,
Thanks
The answer depends explicitly on how the "known" (i.e. non-missing) genes were found (your question needs more detail, how was the current annotation generated? What's your definition of "missing"?). However, two approaches that come to mind: find RNA-Seq data and either assemble it (e.g. Trinity) or take it through the Tuxedo suite (tophat/cufflinks). That will give you genes not likely to have been found in your current set. The second approach would be to simply use an alternate gene predictor, something different than what gave you your current set.
Hi, this question has been raised more than one year ago. However, if anyone is still interested in this topic, here is another answer: Homology-based gene prediction might be useful in your case. Several options have been stated in this thread: Repairing Old Genomes With Homology Based Gene Prediction
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Seidel. The annotation was done using RAST. Sequences of the supposedly "missing" from a reference genome were blasted against the contigs of the annotated draft genome. Although, good matches between the sequences of reference genome genes and the draft genome contigs were found, it's unclear how to link the match positions to to the RAST CDS coordinates.