Hi all,
I have assembled multiple bacterial genomes sequenced using Oxford Nanopore Minion (FLO-MIN106 flowcell) sequencer.
I have used Pomoxis, Unicycler assemblers to perform the genome assembly. Upon annotating the resultant fasta files of the genome assembly using RAST and PATRIC, I have observed the CDS number to be abnormally hight (Double in some cases) when compared to existing assemblies.
CDS ratio rages from 0.44 to 0.60 (Normal CDS ratio prescribed by NCBI ranges between 0.8 and 1.2).
How can I overcome this issue of abnormal CDS count issue. What is the way forward?
Thanking you all
If your consensus accuracy is 99.9% then you still have 1 errors every 1000 bp. A typical bacterial gene is ~ 1000bp long. That 1 error is usually an indel. This results in a frame-shift in your CDS. If you use a gene finder like Prodgial (used in prokka) then you will get ~2 predicted CDS for every real CDS. You need to also sequence it with Illumina and polish the nanopore assembly.
One note about this : there's probably already Illumina data out there for your strains of interest. Check this rather nice program to locate and download SRA or ENA data more quickly:
https://ewels.github.io/sra-explorer/
There is no Illumina data available for the isolates under study.