How To Annotate A Newly Sequenced Genome
1
5
Entering edit mode
11.4 years ago

Hello World! I need your help.

I have many contig sequences of a new microorganism which I'd like to characterize, for example: identify putative genes and assign putative function to them, also record the domains present with their respective e-values, orientation of the strand, etc. and finally save that information (for each contig) as a *.embl file.

My first thought is to use a software to identify putative genes such as GeneMark, but then I don't knot what to do.

Can anybody try give me some guidelines or a pipeline on how could I do that, using BioPython?

Thanks

biopython visualization • 11k views
ADD COMMENT
2
Entering edit mode

MAKER is a popular pipeline for gene predictions. http://gmod.org/wiki/MAKER

ADD REPLY
1
Entering edit mode

I agree, MAKER is possibly the way to go. Try reading the tutorial http://gmod.org/wiki/MAKER_Tutorial_2012 to understand the basics. Also, I think that language preference should be regarded as secondary.

ADD REPLY
1
Entering edit mode

Also RAST can be useful.

ADD REPLY
0
Entering edit mode

We're using MAKER right now on an AMAZON EC2 instance. I can't say the installation is especially user friendly, but we ended up choosing MAKER because it seemed the best option.

ADD REPLY
7
Entering edit mode
11.4 years ago
cts ★ 1.7k

For bacteria/archaea we use prokka, from the Victorian Bioinformatics Consortium:

Prokka is a software tool to annotate bacterial, archaeal and viral genomes very rapidly, and produce output files that require only minor tweaking to submit to Genbank/ENA/DDBJ

And when their say minor tweaking it is really minor. Prokka gives you a genbank and a sequin file for rapid upload to NCBI as well as other files that are useful in different circumstances (like a gff file). Like any pipeline it has a few dependancies but prokka itself is very easy to install.

In comparison to MAKER, prokka does not handle multi-exon gene models (no introns) so it is only useful for bacteria/archaea but it does protein, tRNA and rRNA annotations. I'm not sure whether MAKER will also annotate the RNAs (it doesn't say so on their website but I may have missed it). Prokka also uses both blast and HMMER for functional annotations using custom subsets of Uniref, CDD, Pfam, Tigrfam. Again the MAKER website only mentions using blast. (Perhaps others can correct me if I'm wrong, I've never used MAKER)

ADD COMMENT
0
Entering edit mode

I would agree with the last part of this and suggest that you shouldn't use BLAST alone if you want to have accurate functional annotation of the coding sequences that are predicted.

ADD REPLY
0
Entering edit mode

MAKER uses blast, exonerate, and snap. I don't know how it compared to Prokka.

ADD REPLY
0
Entering edit mode

Prokka has trouble working with Spades. I'm getting "contig ID too long". Various blog posts suggests using some additional flags (e.g. --compliant, --centre) but so far I was not able to solve the issue. I know some have regressed to older versions of Prokka as the bug was introduced relatively recently. But so far I cannot really recommend Prokka. If it stumbles at the first obstacle I wonder how accurate is at the more complex stuff.

ADD REPLY
0
Entering edit mode

If you want, I wrote a script to rename contigs assemblied by Spades in order to perform prokka annotation

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6