Question

Prokka Annotation or NCBI Annotation

2

Entering edit mode

4.5 years ago

Optimist ▴ 190

Dear All,

I have two sets of annotation files for 15 Bacterial genomes. One set from NCBI annotations (from RefSeq) and the other from Prokka (I have run it in my local machine).

Which one is advisable to use between the two for all downstream analysis?

Awaiting your valuable feedback

Thank You

Prokka NCBI GenBank • 5.4k views

ADD COMMENT • link updated 3.6 years ago by Emilie-Martine Sixhøj Jepsen • 0 • written 4.5 years ago by Optimist ▴ 190

score 1 · Accepted Answer · 2020-11-15

1

Entering edit mode

4.5 years ago

Mensur Dlakic ★ 29k

If I remember correctly, prokka comes only with HAMAP database of HMMs, which will produce terrible annotations on prokaryotic genomes. To get good annotations you would need to install at least Pfam and TIGRfams. Don't know if you have done that or not, but you can find out by looking at prokka's annotations. If there are many hypothetical proteins for prokka where NCBI files have meaningful annotations, chances are that you don't have any extra prokka HMM databases. If you are literally comparing identical genomes, it may be better to go with NCBI annotations.

ADD COMMENT • link 4.5 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

I have just checked my Prokka and It has following databases

Looking for databases in: /home/bio2/miniconda3/envs/prokka_env/db

Kingdoms: Archaea Bacteria Mitochondria Viruses
Genera: Enterococcus Escherichia Staphylococcus
HMMs: HAMAP
CMs: Archaea Bacteria Viruses

As you rightly pointed out, it doesn't have Pfam and TIGRfams.

Is there a way to add these databases to my prokka?

Currently, Since the bacterial genomes number is 15, I can download the GenBank file with annotations from NCBI. But in future If I want to add more genomes, then I will have to fall back on Prokka for large-scale annotations.

Thank You

ADD REPLY • link 4.5 years ago by Optimist ▴ 190

1

Entering edit mode

It should be enough to download the databases and place them in prokka's hmm directory (for you that seems to be /home/bio2/miniconda3/envs/prokka_env/db/hmm).

After gunzipping, I suggest you rename the databases to specify the order in which they will be searched during annotation:

mv TIGRFAMs_15.0_HMM.LIB 1-TIGRFAMs_15.0.hmm
mv Pfam-A.hmm 2-Pfam-A.hmm
mv HAMAP.hmm 3-HAMAP.hmm

When all is done run:

prokka --setupdb

ADD REPLY • link 4.5 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Should I modify the command in order for prokka to use the 3 databases? or it takes it automatically??

I have run this 'prokka --setupdb'

This is the command i generally use to run prokka

prokka GCA_000168335.1_ASM16833v1_genomic.fasta --outdir GCA_000168335.1_ASM16833v1_prokka_compliant_out_29-10-20 --prefix GCA_000168335.1_ASM16833v1_prokka --genus Pseudomonas --species aeruginosa --kingdom Bacteria --usegenus --compliant --cpus 64 --rfam

Kindly give your valuable suggestions & feedback

Thank You

ADD REPLY • link 4.5 years ago by Optimist ▴ 190

1

Entering edit mode

If newly added databases were listed after running prokka --setupdb, you should be able to run everything as intended. That particular command may not give you anything different on Pseudomonas whether you invoked --usegenus or not as I think that prokka has gene specific information only about some enterobacteria (prokka --listdb will give you database information).

ADD REPLY • link 4.5 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Is prokka not designed and optimised for prokaryotic genome annotation?

If annotating with Pfam, how would you do that?

ADD REPLY • link 4.4 years ago by robert.murphy ▴ 110