How/why contigs and scaffolds end up being identical in number and length by spades de novo assembly despite their different definitions
1
0
Entering edit mode
15 months ago

I encountered an issue while performing de novo assembly of phage genome reads (illumina paired end read) using SPAdes. I noticed that the number and length of contigs and scaffolds are the same, despite their different definitions. This suggests that the scaffolding process did not lead to any improvements or merging of contigs. I am seeking your guidance on how contigs and scaffolds can end up being identical in number and length. Additionally, I would appreciate your advice on generating the consensus genome based on this scenario.

I would appreciate any insights you can provide on this matter, as well as guidance on generating a consensus genome given the this situation.

Thank you for your attention to this matter.

de-novo assembly contigs scaffolds • 1.2k views
ADD COMMENT
0
Entering edit mode
15 months ago
shelkmike ★ 1.4k

Strictly speaking, scaffolds should consist of several adjacent contigs joined by sequences of N. However, if SPAdes cannot find contigs adjacent to some contig, it will still write it to "scaffolds.fasta".

This usually happens when the contigs are separated by long repeats which are not spanned by read pairs.
A standard solution for this is to make long reads (Oxford Nanopore or PacBio).
You can visualize the repeat graph produced by SPAdes with Bandage (https://github.com/asl/BandageNG) to understand repeats of your genome better.

A possible alternative explanation is that there are long regions with very low read coverage in your genome.

ADD COMMENT

Login before adding your answer.

Traffic: 2321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6