Question

High CDS Count in my assembled Genome using Nanopore reads (ONT) data

0

Entering edit mode

5.7 years ago

Optimist ▴ 190

Hi all,

I have assembled multiple bacterial genomes sequenced using Oxford Nanopore Minion (FLO-MIN106 flowcell) sequencer.

I have used Pomoxis, Unicycler assemblers to perform the genome assembly. Upon annotating the resultant fasta files of the genome assembly using RAST and PATRIC, I have observed the CDS number to be abnormally hight (Double in some cases) when compared to existing assemblies.

CDS ratio rages from 0.44 to 0.60 (Normal CDS ratio prescribed by NCBI ranges between 0.8 and 1.2).

How can I overcome this issue of abnormal CDS count issue. What is the way forward?

Thanking you all

High CDS WGS Nanopore Assembly • 1.6k views

ADD COMMENT • link updated 5.7 years ago by h.mon 35k • written 5.7 years ago by Optimist ▴ 190

score 2 · Answer 1 · 2019-08-17

2

Entering edit mode

5.7 years ago

h.mon 35k

As this is a Nanopore-only assembly, there are many errors (mainly indels) which negatively affect gene prediction:

Nanopore only assembly errors

Mind the gaps – ignoring errors in long read assemblies critically affects protein prediction

ADD COMMENT • link 5.7 years ago by h.mon 35k

2

Entering edit mode

If your consensus accuracy is 99.9% then you still have 1 errors every 1000 bp. A typical bacterial gene is ~ 1000bp long. That 1 error is usually an indel. This results in a frame-shift in your CDS. If you use a gene finder like Prodgial (used in prokka) then you will get ~2 predicted CDS for every real CDS. You need to also sequence it with Illumina and polish the nanopore assembly.

ADD REPLY • link 5.7 years ago by Torst ▴ 980

0

Entering edit mode

One note about this : there's probably already Illumina data out there for your strains of interest. Check this rather nice program to locate and download SRA or ENA data more quickly:

https://ewels.github.io/sra-explorer/

ADD REPLY • link 5.7 years ago by colindaven 7.4k

0

Entering edit mode

There is no Illumina data available for the isolates under study.

ADD REPLY • link 5.7 years ago by Optimist ▴ 190