How to annotate FASTA genome using GFF genes from another genome?
1
0
Entering edit mode
5.4 years ago

I have a FASTA genome sequence of a particular accession of a species that I'm using. It was also annotated (in terms of genes and CDS) by unknown people some unknown time ago. This genome has shown to have issues and therefore I have re-assembled it into a new genome. Of course I could run a gene prediction myself, but I don't have much experimental evidence and I would like to transfer the annotation that was already done, because the analysis that I'm doing relies on the old one for other parts (i.e. I'd be better off not generating new gene models).

Is there a way, given an old FASTA and its GFF annotation, to annotate a new FASTA?

I was trying to code it myself, extracting the CDS sequences from the old FASTA using the GFF, and mapping them against the new FASTA, but it's very tedious and laborious, so I'm wondering whether there are tools out there that do this better. I tried RATT but couldn't really use it (needs ENSEMBL IDs).

FASTA GFF annotation program genome • 3.5k views
ADD COMMENT
0
Entering edit mode

If you do not trust the original assembly then how can you trust original annotation?

That said, have you tried to assess how similar your new assembly is to the old one? Perhaps that should give you an idea if the annotation would be transferable.

ADD REPLY
0
Entering edit mode

I don't know specifcally using GFF, but there is RATT

Alternatively, many de novo annotation tools (e.g. prokka for bacteria) will accept a list of 'trusted proteins' from which to begin their annotations.

ADD REPLY
0
Entering edit mode

I tried RATT but couldn't really use it (needs ENSEMBL IDs).

I can't use RATT sadly.

@genomax I don't necessarily trust the original annotation, I just want to give it a try if there is a simple way to transfer it, before doing my own gene prediction which is more time-consuming.

ADD REPLY
0
Entering edit mode

What genome are you working with? You can create Embl files from gff (look for gff to embl converter scripts), as well.

ADD REPLY
0
Entering edit mode

Might consider this. I'm working with E. coli so... then perhaps I can use RATT with that!

ADD REPLY
0
Entering edit mode

Doing straight mauve comparisons with your data would be feasible since you now say that you are working with E. coli. http://darlinglab.org/mauve/mauve.html

ADD REPLY
0
Entering edit mode

Already done, but thanks.

ADD REPLY
0
Entering edit mode

Oh my bad, totally missed that from a skim read!

ADD REPLY
0
Entering edit mode

This genome has shown to have issues and therefore I have re-assembled it into a new genome

What kind of issues? And will that the affect gene models? Other options in addition to RATT are:

ADD REPLY
0
Entering edit mode

Thanks, I'll give them a try!

ADD REPLY
1
Entering edit mode
5.4 years ago

In the end, what did the trick was this tool: http://bacteria.ensembl.org/Escherichia_coli/Tools/AssemblyConverter?db=core

Which is based on CrossMap.

ADD COMMENT

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6