What is the exact definition for scaffold?
4
1
Entering edit mode
8.2 years ago
k.kathirvel93 ▴ 310

Hi Everyone

I would like to know the exact definition for scaffold(for better understand). And another, how the de novo assembly tools are picking two contigs to create scaffolds, means, in which manner. Example : is De novo tools picking 1st and 2nd contigs for scaffold creation or not ? If suppose they picking like that, why 1st and 2nd? why not 1st and 3rd?. Thanks in advance

Assembly genome alignment sequence next-gen • 11k views
ADD COMMENT
6
Entering edit mode
8.2 years ago
igor 13k

PacBio recently had a nice blog post explaining this with a nice figure: http://www.pacb.com/blog/genomes-vs-gennnnes-difference-contigs-scaffolds-genome-assemblies/

enter image description here

ADD COMMENT
2
Entering edit mode
8.2 years ago
Calvin ▴ 80

Before explaining what scaffold, the contig means a set of overlapping DNA segments that together represent a consensus region of DNA (from wikipedia).

So the definition of scaffold is that when two contigs or more contigs are connected by gaps which represented as may NNNNNN in most cases. So even if the contigs is super long, if there is no gaps in between, we still call them contig. The order of contigs in scaffold is mostly defined by paired-end reads bcoz paired-end reads itself is orianted so that contigs can be placed according to the paired-end reads you supply. So this is why they pick 1 st and 2nd but not 1st and 3rd. I hope this answer your question.

ADD COMMENT
1
Entering edit mode

Unsequenced regions between mate pairs in contigs and between scaffolds are often represented as runs of 'N's in the final assembly

A beginner's guide to eukaryotic genome annotation

some times contigs has N's
How to interpret N's in contigs
How do denovo genome/transcriptome assemblers treat ambiguous bases?

ADD REPLY
0
Entering edit mode

Contigs and scaffolds could actually be identical: scaffolds and contigs SPADES

ADD REPLY
2
Entering edit mode
8.2 years ago

You better try to revise the concept of mate paired sequences to order contigs.

When you assemble, you always get many contigs that are not connected to each other

Mate paired sequences provide a way to connect them. Read this information to learn how to build mate paired sequences and their differences with paired end sequences

And give a look to this small revision to learn how to use mate paired to order contigs

This will clarify the concept of scaffold. Mate paired sequences can order contigs, and like Calvin says, it will explain the presence of these NNNNN in the gaps

ADD COMMENT
2
Entering edit mode
8.2 years ago

To formalize the answer -

A contig is a contiguous sequence of nucleotides (A, C, G, T).

A scaffold consists of one or more contigs, typically joined by Ns which represent unknown sequence. So, for example:

This is a contig, and also a scaffold:
ACGTTGCTG

This is one scaffold, but two contigs (if you set the maximum number of Ns between contigs to be at least 4):
ACGTTNNNNGTTGTGT

This is one contig, if you decide the maximum number of consecutive Ns within a contig is 10:
ACGTTNNNNGTTGTGT

Note that the last two scaffolds are identical, but one was declared as two contigs, and the other was declared a single contig. It's a matter of definition, and up to the observer. But in general, every contig can be a scaffold; a scaffold may contain multiple contigs separated by Ns, or may be a single contig with zero or more Ns. A contig can never encompass multiple scaffolds.

ADD COMMENT

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6