Question

bacterial genome assembly with nano pore sequence data

0

Entering edit mode

4.4 years ago

rthapa ▴ 90

Hi,

I am doing genome assembly of different bacterial strains with barcoded nanopore sequenced reads. The genome size of the bacteria is 3.8Mb. After genome assembly, I am getting 2 contigs with longest contig length of 3.6 Mb for one strain and 9 contigs with 1.5 Mb contig length for another strain. Actually, the total number of reads for the strain for which I got shortest contig length was almost double that the other strain. 1) Does another round of sequencing with nanopore improves the sequence assembly quality? 2) What could be the reason of getting shortest contig length for the genome for which I had large number of reads?

Thanks

assembly nanopore • 928 views

ADD COMMENT • link 4.4 years ago by rthapa ▴ 90

0

Entering edit mode

The read length is critical for genome assembly.

Check the read lengths of both datasets. I find the tool stats.sh from the bbmap package to be an excellent tool for this (it's actually intended for checking assembly contig stats but works well for long reads too).

Also, which assembler are you using ? I think Flye is performing the best these days.

ADD REPLY • link 4.4 years ago by colindaven 7.4k

0

Entering edit mode

I used canu for genome assembly.

ADD REPLY • link 4.4 years ago by rthapa ▴ 90

0

Entering edit mode

Here is the results of contig stats for the contigs with large number of contigs

A   C   G   T   N   IUPAC   Other   GC  GC_stdev
0.2318  0.2686  0.2677  0.2319  0.0000  0.0000  0.0000  0.5363  0.0594

Main genome scaffold total:             9
Main genome contig total:               9
Main genome scaffold sequence total:    3.852 MB
Main genome contig sequence total:      3.852 MB    0.000% gap
Main genome scaffold N/L50:             2/1.005 MB
Main genome contig N/L50:               2/1.005 MB
Main genome scaffold N/L90:             4/608.997 KB
Main genome contig N/L90:               4/608.997 KB
Max scaffold length:                    1.157 MB
Max contig length:                      1.157 MB
Number of scaffolds > 50 KB:            4
% main genome in scaffolds > 50 KB:     97.57%


Minimum     Number          Number          Total           Total           Scaffold
Scaffold    of              of              Scaffold        Contig          Contig  
Length      Scaffolds       Contigs         Length          Length          Coverage
--------    --------------  --------------  --------------  --------------  --------
    All                  9               9       3,851,716       3,851,716   100.00%
   1 KB                  9               9       3,851,716       3,851,716   100.00%
 2.5 KB                  8               8       3,849,312       3,849,312   100.00%
   5 KB                  6               6       3,841,766       3,841,766   100.00%
  10 KB                  6               6       3,841,766       3,841,766   100.00%
  25 KB                  6               6       3,841,766       3,841,766   100.00%
  50 KB                  4               4       3,758,227       3,758,227   100.00%
 100 KB                  4               4       3,758,227       3,758,227   100.00%
 250 KB                  4               4       3,758,227       3,758,227   100.00%
 500 KB                  4               4       3,758,227       3,758,227   100.00%
   1 MB                  2               2       2,161,944       2,161,944   100.00%

The contig stats for the one with only 2 contigs is,

A   C   G   T   N   IUPAC   Other   GC  GC_stdev
0.2315  0.2671  0.2688  0.2326  0.0000  0.0000  0.0000  0.5359  0.0165

Main genome scaffold total:             2
Main genome contig total:               2
Main genome scaffold sequence total:    3.839 MB
Main genome contig sequence total:      3.839 MB    0.000% gap
Main genome scaffold N/L50:             1/3.795 MB
Main genome contig N/L50:               1/3.795 MB
Main genome scaffold N/L90:             1/3.795 MB
Main genome contig N/L90:               1/3.795 MB
Max scaffold length:                    3.795 MB
Max contig length:                      3.795 MB
Number of scaffolds > 50 KB:            1
% main genome in scaffolds > 50 KB:     98.86%


Minimum     Number          Number          Total           Total           Scaffold
Scaffold    of              of              Scaffold        Contig          Contig  
Length      Scaffolds       Contigs         Length          Length          Coverage
--------    --------------  --------------  --------------  --------------  --------
    All                  2               2       3,838,763       3,838,763   100.00%
  25 KB                  2               2       3,838,763       3,838,763   100.00%
  50 KB                  1               1       3,795,002       3,795,002   100.00%
 100 KB                  1               1       3,795,002       3,795,002   100.00%
 250 KB                  1               1       3,795,002       3,795,002   100.00%
 500 KB                  1               1       3,795,002       3,795,002   100.00%
   1 MB                  1               1       3,795,002       3,795,002   100.00%
 2.5 MB                  1               1       3,795,002       3,795,002   100.00%

I wonder if the strain with many contigs has more than one plasmid. Does anyone have any idea?

ADD REPLY • link 4.4 years ago by rthapa ▴ 90