Differences in annotations using Prokka on assembly done by SPAdes
0
0
Entering edit mode
6.1 years ago
jeetsahu ▴ 10

I am doing annotation two assemblies, one done by SPAdes and other downloaded from NCBI. I used the data from PRJNA387062 (ncbi). When I used Prokka for annotation on both assemblies, I get large differences between the two annotations. Below is the summary on assembly done by SPAdes.

organism: Genus species strain

contigs: 491

bases: 3928941

repeat_region: 1

CDS: 3629

tRNA: 79

tmRNA: 1

rRNA: 4

Following is summary of the annotation done on assembly submitted to NCBI.

organism: Genus species strain

contigs: 16

bases: 4599140

tRNA: 80

tmRNA: 1

CDS: 4420

repeat_region: 2

rRNA: 21

There is vast difference in rRNA and CDS counts. Is this difference acceptable?

prokka spades annotation • 1.9k views
ADD COMMENT
1
Entering edit mode

It’s common to get different results from different annotators. Prokka is particularly conservative in its calling of features.

“Acceptable” depends what you want to do with the data?

Both would be acceptable submissions, there are plenty of genomes in NCBI that have been annotated with Prokka.

I’d be more concerned by the fact that your 2 assemblies are over half a Megabase different in size, which could reasonably account for the ~800 CDS difference.

ADD REPLY
0
Entering edit mode

I used SPAdes for assembly. So what difference in number of base pairs is said to be "acceptable"?

How did you arrive at the number of CDS difference?

ADD REPLY
1
Entering edit mode

There isn't an 'acceptable' value. It depends what's going on with your data. An 'acceptable' difference is one you can explain, without it negatively affecting your analyses.

Check both assemblies for genome coverage, contamination, the presence of mobile elements etc.

The difference in CDS is apparent from your data. One genome has ~3600, the other has ~4400. A difference of roughly 800 loci. For most prokaryotes, there are (very) approximately 1000 CDSs per megabase. Your data is off by a little under 1 megabase, so you have a little under 1000 gene difference as a result.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6