Question

Prokka bacteria genome annotation

0

Entering edit mode

6.3 years ago

agata88 ▴ 870

Hi all!

I was annotating bacteria genome with prokka. At the end It gave me a results, which are not very understood for me. Maybe somebody more familiar with this program will help?

I have multiple contigs assigned to the same annotation. I run this command:

./prokka --outdir contigs_prokka --kingdom Bacteria --genus X --proteins uniprot_bacteria.fasta --usegenus --evalue 0.01 --rfam --cpu 8 --norrna contigs.fasta &

As a result I have tsv file with annotation including list of contigs and its annotation. For some of results I see that multiple contigs are assigned to the same annotation. For example:

contig1 CDS 1965                Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1
contig2   CDS   918             Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1

I am not sure how to interprate this:

whether it's unconnected contigs?
whether one sequence presents gene and the rest are pseudogenes?
can I take one - the longest - for final annotation and ignore rest, or annotate as potential pseudogenes?

Many thanks for any suggestions. Agata

prokka • 3.9k views

ADD COMMENT • link 6.3 years ago by agata88 ▴ 870

0

Entering edit mode

Both could be real and just happen to be Zinc-t ATPases. Did you check for sequence redundancy in your contigs before running prokka. e.g. contig2 could be entirely similar to contig1 (and contained within it).

ADD REPLY • link 6.3 years ago by GenoMax 147k

0

Entering edit mode

Yes, I used CD-HIT, it resulted in 10905 clusters from 10942 contigs.

This is not a single case, most records are multiplied.

ADD REPLY • link 6.3 years ago by agata88 ▴ 870

score 0 · Answer 1 · 2018-08-24

0

Entering edit mode

6.3 years ago

agata88 ▴ 870

Hi all!

I have a solution for my question. So, it toured out that my sample is contaminated, that is why I had such huge amount of contigs. After filtering annotation went well. Hope that will help in the future similar dilemmas.

Btw I've filtered contigs by blastn and species specific nt database.

Best,

Agata

ADD COMMENT • link 6.3 years ago by agata88 ▴ 870

1

Entering edit mode

Hi Agata,

Kindly send me the running command line of filtering contigs with blastnn?

ADD REPLY • link 5.4 years ago by hjafar ▴ 10

0

Entering edit mode

You may look at this as a solution but having contaminated data going into an assembly is not a good thing. If you choose to submit this assembly to NCBI you may throw someone else off if they use this data for genome comparisons.