PROKKA annotation problem - wrong reference?
1
1
Entering edit mode
3.4 years ago
blur ▴ 280

Hi,

I am running PROKKA for the very first time and ran a test on a bacteria (NOT E.coli).

I ran this cmd, expecting to get a gene with gene name:

prokka --outdir assembly_test/mydir_genes --prefix mygenome --addgenes  assembly_test/assembly.fasta

What I get is this [no gene name, mostly]:

>BFMIPPFK_00026 Lipid A biosynthesis myristoyltransferase 

I thought maybe I need to set up a specific db for the bacteria I am using - But I couldn't find how to do it in the manual. I saw this cmd - but I was not clear on what files this needs to run.

--setupdb         Index all installed databases

I also saw an option to add --species but it didn't change anything when I ran it (I might have not used it right - I just wrote the name of my baceria)

Any help would be appreciated,

prokka annotation • 2.2k views
ADD COMMENT
1
Entering edit mode
3.4 years ago
h.mon 35k

You can improve the annotation by providing a GenBank (or fasta) file with a closely related species annotation with the --proteins flag. Be sure to read Prokka documentation, it is very detailed and well written.

The --species flag you used are just one of the flags to add taxonomic annotation to your genome:

Organism details:
  --genus [X]       Genus name (default 'Genus')
  --species [X]     Species name (default 'species')
  --strain [X]      Strain name (default 'strain')
  --plasmid [X]     Plasmid name or identifier (default '')

Something like --genus Escherichia --species coli --strain POO247

ADD COMMENT
0
Entering edit mode

I have also downloaded the Genbank of a reference genome and ran it using --proteins. It gives similar results...

prokka   --outdir /assembly_test/mydir_genes_ref --prefix mygenome --proteins /ref.gb   /assembly_test/assembly.fasta

The Genbank looks like this:

   gene            162628..163530
                     /gene="lpxC"
                     /locus_tag="BAL062_00145"
     CDS             162628..163530
                     /gene="lpxC"
                     /locus_tag="BAL062_00145"
                     /codon_start=1
                     /transl_table=11
                     /product="UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine
                     deacetylase,UDP-3-O-[3-hydroxymyristoyl]

The resulting names are the product not the gene name...

I also tried using the --addgenes flag with the exact same results:

prokka   --outdir assembly_test/mydir_genes_ref_genes --prefix mygenome --proteins ref.gb --addgenes assembly_test/assembly.fasta

I have checked the CSV file and this contains the right gene name. I assumed the "--addgenes" flag didn't work so tried it again with --compliant and --rawproduct

Could it be a problem with the order of flags? I went over the manual several times - if the answer is there I was not able to find it...

ADD REPLY
0
Entering edit mode

Did you check the GenBank output (.gbff, if I am not mistaken), or the .gff output? What names are added to these files?

ADD REPLY
0
Entering edit mode

I can see the gene names I was expecting in the GFF file generated - as well as the CSV, just not the final fnn/faa files :(

prokka  gene    209965  210882  .   +   .   ID=BFMIPPFK_01470_gene;Name=lpxC;gene=**lpxC**;locus_tag=BFMIPPFK_01470
ADD REPLY

Login before adding your answer.

Traffic: 1543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6