Question

How/why contigs and scaffolds end up being identical in number and length by spades de novo assembly despite their different definitions

0

Entering edit mode

20 months ago

dejenieshif • 0

I encountered an issue while performing de novo assembly of phage genome reads (illumina paired end read) using SPAdes. I noticed that the number and length of contigs and scaffolds are the same, despite their different definitions. This suggests that the scaffolding process did not lead to any improvements or merging of contigs. I am seeking your guidance on how contigs and scaffolds can end up being identical in number and length. Additionally, I would appreciate your advice on generating the consensus genome based on this scenario.

I would appreciate any insights you can provide on this matter, as well as guidance on generating a consensus genome given the this situation.

Thank you for your attention to this matter.

de-novo assembly contigs scaffolds • 1.3k views

ADD COMMENT • link updated 20 months ago by Ram 45k • written 20 months ago by dejenieshif • 0

score 0 · Answer 1 · 2023-08-16

Strictly speaking, scaffolds should consist of several adjacent contigs joined by sequences of N. However, if SPAdes cannot find contigs adjacent to some contig, it will still write it to "scaffolds.fasta".

This usually happens when the contigs are separated by long repeats which are not spanned by read pairs.
A standard solution for this is to make long reads (Oxford Nanopore or PacBio).
You can visualize the repeat graph produced by SPAdes with Bandage (https://github.com/asl/BandageNG) to understand repeats of your genome better.

A possible alternative explanation is that there are long regions with very low read coverage in your genome.