Question

Bacterial Annotation Pipeline

5

Entering edit mode

13.5 years ago

scapella ▴ 390

Hi,

Might be this one is an old question but I haven't found a real answer. Does anyone know an annotation pipeline (automatic or not) for working with bacterial species? In my case, there is not reference genome close to my species.

bacteria next-gen-sequencing • 12k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 13.5 years ago by scapella ▴ 390

0

Entering edit mode

Thanks guys for your answers! I'll try RAST and BG7. Both look very promising!

ADD REPLY • link 13.5 years ago by scapella ▴ 390

0

Entering edit mode

Hopefully I'll be releasing and publishing my Prokka in early 2012.

ADD REPLY • link 13.5 years ago by Torst ▴ 980

0

Entering edit mode

Hello everybody,

Does anyone can give me solution? In fact, I annotated my genome sequence by PROKKA, but when I analysed my sequence by blast I found that some ORFs don't start or finish in the same location comparing to what annotated in blast. Is there an other server can give the good annotation and the good ORFs, or a server that I can use to correct manually?

Thank you very much

ADD REPLY • link updated 2.2 years ago by Ram 45k • written 9.9 years ago by etudiantscience • 0

Ram · Answer 1 · 2011-10-18

Hi!

We (at Oh no sequences!) have developed an annotation system specially designed for bacterial and NGS data. It's called BG7, probably the most interesting feature to you is that a close reference genome is not needed.

Unlike other annotation pipelines, like those based on ORF prediction with Glimmer, where your annotation strongly depends on having a close reference genome BG7 system works very well even when you don't have a reference genome. You just need a set of what we call 'reference proteins' that will guide the annotation, these proteins don't need to be too similar to the proteins you expect to find in your genome, so it's no problem if you don't have a close reference. We've tested it in lots of genomes (some of them with no similar sequences) and are very happy with the results.

The system is open-source (AGPL-V3 license) so you can freely use it.

We're about to launch its website, meanwhile you can take a look at these slides describing it and the results files of the E. coli Germany outbreak we published in this Github repository (the system gives the annotations in more format like gbk and embl, this is just an example of the annotations)

Please let me know if you want to know anything else, @pablopareja is the main developer, you can also ask him

HTH

Marina

EDIT: We've just launched the bg7 website http://bg7.ohnosequences.com/ please feel free to try it (any feedback is highly appreciated) :)

Ram · Answer 2 · 2011-10-18

5

Entering edit mode

13.5 years ago

Martin A Hansen 3.0k

RAST works really well.

RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree.

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 13.5 years ago by Martin A Hansen 3.0k

score 4 · Answer 3 · 2011-10-18

The GMOD project has several alternatives, of which MAKER (mentioned above) is one, though it leans a little towards the euks. Another option which was designed for work with prokaryotes is DIYA (though looking at that page now it looks like SourceForge is messing with our wiki page). There is also Ergatis which was designed by the people at TIGR/JCVI for doing bacterial annotation, which they know how to do very well (they are now at the University of Maryland). Ergatis is by far the most powerful, but overkill to install if you are only doing one genome. If you are only doing one genome, you might want to look at CloVR, which I am pretty sure is powered by Ergatis but is inside a virtual machine that you can download and run (I think they have options for running it on the cloud too, but I haven't talked to them in a while).

Ram · Answer 4 · 2011-10-18

3

Entering edit mode

13.5 years ago

Haibao Tang 3.0k

It takes a bit time to set up, but try MAKER.

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 13.5 years ago by Haibao Tang 3.0k

Ram · Answer 5 · 2015-01-22

1

Entering edit mode

10.2 years ago

dago ★ 2.8k

PROKKA is quite good and fast and you do not need any reference genome.

It perform for you ORF prediction and annotation using several well established tools.

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 10.2 years ago by dago ★ 2.8k

Ram · Answer 6 · 2015-01-22

0

Entering edit mode

10.2 years ago

wrf ▴ 70

This thread seems to have died despite this not being a solved problem. One could also check PRODIGAL. It does a very fast annotation of proteins, like 10 seconds. It is a single binary to download and running is fast since bacterial genomes are small. If it doesn't work, then not much time is lost.

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 10.2 years ago by wrf ▴ 70