I have two sets of annotation files for 15 Bacterial genomes. One set from NCBI annotations (from RefSeq) and the other from Prokka (I have run it in my local machine).
Which one is advisable to use between the two for all downstream analysis?
If I remember correctly, prokka comes only with HAMAP database of HMMs, which will produce terrible annotations on prokaryotic genomes. To get good annotations you would need to install at least Pfam and TIGRfams. Don't know if you have done that or not, but you can find out by looking at prokka's annotations. If there are many hypothetical proteins for prokka where NCBI files have meaningful annotations, chances are that you don't have any extra prokka HMM databases. If you are literally comparing identical genomes, it may be better to go with NCBI annotations.
I have just checked my Prokka and It has following databases
Looking for databases in: /home/bio2/miniconda3/envs/prokka_env/db
Kingdoms: Archaea Bacteria Mitochondria Viruses
Genera: Enterococcus Escherichia Staphylococcus
HMMs: HAMAP
CMs: Archaea Bacteria Viruses
As you rightly pointed out, it doesn't have Pfam and TIGRfams.
Is there a way to add these databases to my prokka?
Currently, Since the bacterial genomes number is 15, I can download the GenBank file with annotations from NCBI. But in future If I want to add more genomes, then I will have to fall back on Prokka for large-scale annotations.
It should be enough to download the databases and place them in prokka's hmm directory (for you that seems to be /home/bio2/miniconda3/envs/prokka_env/db/hmm).
If newly added databases were listed after running prokka --setupdb, you should be able to run everything as intended. That particular command may not give you anything different on Pseudomonas whether you invoked --usegenus or not as I think that prokka has gene specific information only about some enterobacteria (prokka --listdb will give you database information).
I have just checked my Prokka and It has following databases
Looking for databases in: /home/bio2/miniconda3/envs/prokka_env/db
As you rightly pointed out, it doesn't have Pfam and TIGRfams.
Is there a way to add these databases to my prokka?
Currently, Since the bacterial genomes number is 15, I can download the GenBank file with annotations from NCBI. But in future If I want to add more genomes, then I will have to fall back on Prokka for large-scale annotations.
Thank You
It should be enough to download the databases and place them in prokka's hmm directory (for you that seems to be
/home/bio2/miniconda3/envs/prokka_env/db/hmm
).After gunzipping, I suggest you rename the databases to specify the order in which they will be searched during annotation:
When all is done run:
Should I modify the command in order for prokka to use the 3 databases? or it takes it automatically??
I have run this 'prokka --setupdb'
This is the command i generally use to run prokka
prokka GCA_000168335.1_ASM16833v1_genomic.fasta --outdir GCA_000168335.1_ASM16833v1_prokka_compliant_out_29-10-20 --prefix GCA_000168335.1_ASM16833v1_prokka --genus Pseudomonas --species aeruginosa --kingdom Bacteria --usegenus --compliant --cpus 64 --rfam
Kindly give your valuable suggestions & feedback
Thank You
If newly added databases were listed after running
prokka --setupdb
, you should be able to run everything as intended. That particular command may not give you anything different on Pseudomonas whether you invoked--usegenus
or not as I think that prokka has gene specific information only about some enterobacteria (prokka --listdb
will give you database information).Is prokka not designed and optimised for prokaryotic genome annotation?
If annotating with Pfam, how would you do that?