Hi,
I am doing genome assembly of different bacterial strains with barcoded nanopore sequenced reads. The genome size of the bacteria is 3.8Mb. After genome assembly, I am getting 2 contigs with longest contig length of 3.6 Mb for one strain and 9 contigs with 1.5 Mb contig length for another strain. Actually, the total number of reads for the strain for which I got shortest contig length was almost double that the other strain. 1) Does another round of sequencing with nanopore improves the sequence assembly quality? 2) What could be the reason of getting shortest contig length for the genome for which I had large number of reads?
Thanks
The read length is critical for genome assembly.
Check the read lengths of both datasets. I find the tool stats.sh from the bbmap package to be an excellent tool for this (it's actually intended for checking assembly contig stats but works well for long reads too).
Also, which assembler are you using ? I think Flye is performing the best these days.
I used canu for genome assembly.
Here is the results of contig stats for the contigs with large number of contigs
The contig stats for the one with only 2 contigs is,
I wonder if the strain with many contigs has more than one plasmid. Does anyone have any idea?