I have a FASTA genome sequence of a particular accession of a species that I'm using. It was also annotated (in terms of genes and CDS) by unknown people some unknown time ago. This genome has shown to have issues and therefore I have re-assembled it into a new genome. Of course I could run a gene prediction myself, but I don't have much experimental evidence and I would like to transfer the annotation that was already done, because the analysis that I'm doing relies on the old one for other parts (i.e. I'd be better off not generating new gene models).
Is there a way, given an old FASTA and its GFF annotation, to annotate a new FASTA?
I was trying to code it myself, extracting the CDS sequences from the old FASTA using the GFF, and mapping them against the new FASTA, but it's very tedious and laborious, so I'm wondering whether there are tools out there that do this better. I tried RATT but couldn't really use it (needs ENSEMBL IDs).
If you do not trust the original assembly then how can you trust original annotation?
That said, have you tried to assess how similar your new assembly is to the old one? Perhaps that should give you an idea if the annotation would be transferable.
I don't know specifcally using GFF, but there is RATT
Alternatively, many de novo annotation tools (e.g. prokka for bacteria) will accept a list of 'trusted proteins' from which to begin their annotations.
I can't use RATT sadly.
@genomax I don't necessarily trust the original annotation, I just want to give it a try if there is a simple way to transfer it, before doing my own gene prediction which is more time-consuming.
What genome are you working with? You can create Embl files from gff (look for gff to embl converter scripts), as well.
Might consider this. I'm working with E. coli so... then perhaps I can use RATT with that!
Doing straight
mauve
comparisons with your data would be feasible since you now say that you are working with E. coli. http://darlinglab.org/mauve/mauve.htmlAlready done, but thanks.
Oh my bad, totally missed that from a skim read!
What kind of issues? And will that the affect gene models? Other options in addition to
RATT
are:Thanks, I'll give them a try!