why the repeat sequences interfere with the assembly of the contigs
2
0
Entering edit mode
8.2 years ago

Hi , can you explain in a very very simple way why the repeat sequences interfere with the assembly of the contigs in the sequencing? I can't figure it out. And why the pair-end sequencing helps to overcome this problem?I read that this involves the sequencing of both ends of a fragment, ..but I can't figure out how this process facilitates the detection of genomic rearrangements, repeats etc.... Thanks

sequence Assembly sequencing next-gen • 3.4k views
ADD COMMENT
2
Entering edit mode

as simple as it looks repeat is a repeat of letters "AAAAAAAAAAAAAAAAAAAAAAAAAA" or any other repeat type, so if your genome have wide range of repeats and its length is longer than your read: for example if your read is 100 and your repeat is 1000., so where this 100 read will fit in the 1000 begin end or where? this is the problem.

how paire end will resolve it? assume that you have
ATATATAT ATATAATTGAAAGGAA

and you have paire end read first is ATAT second is AAGG with distance in between is 14 bp that will make it easier cause you know the distance between them and this will help you even if there is repeat

ATAT ATAT ATATAATTGAAAGGAA
ATAT- - - - - - - - - - - - - - AAGG

so in paire end read you have more info like distance and ordination of read.

more could be found here http://seqanswers.com/forums/showpost.php?p=1350&postcount=5

and I quote

Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

ADD REPLY
2
Entering edit mode
8.2 years ago

Assemblers create graphs by overlapping the ends of reads to determine which sequence comes after what. See example below:

ATGGTCGATC                  --------------->  ATGGTCGATCGTGTAGCT
       ATCGTGTAGCT

Reads from repeat regions will have identical ends. Assemblers will often get confused in such cases because reads coming from such regions will have similar/identical ends.

A repeat region:    ATATATATATATATAT------ATATATATATATAT-----ATATATTATAT

Reads from repeat region: [1] ATATATATAT   [2] TATATATAT   [3] ATATATTATAT

Now, it is hard to decide whether to merge [1] and [2], or [2] and [3] or [3] and [1] or all of them. Any incorrect merging will lead to false assembly.

We lose 2 important piece of information:

  1. How many repeat regions are actually there? Assemblers often merge repeats (false interpretation).

  2. What was the location or order of repeats?

    So how PE sequencing helps?
    

Paired-end sequencing reads from both ends of a DNA fragment, and is capable of pairing ends together -- so you know what's on the ends of your fragments, even if each individual read doesn't overlap with its mate. Also, we know the distance between pairs.

Now, when you sequence a repeat and align paired end reads on a region flanking a repeat, you can identify which repeat region, the reads belong to. See below image

Image Courtesy: http://www.anthonybaldor.com

ADD COMMENT
0
Entering edit mode
8.2 years ago

Thank you both of you for yourclear answers.One thing is still unclear. With the pair-end sequencing do you have the sequencing of the central part?I mean, your reads sequence the ends if I understood correctly..but what about the central parts?

ADD COMMENT
0
Entering edit mode

you should put this as comment not an answer.
for your question the middle part you do not have it in one paire, but other paire end sequence would full in this range so you will have it over all; for example:

ATAT ATAT ATATAATTGAAAGGAA
 ATAT- - - - - - - - - - - - - - AAGG  
     ATAT- - - - - - - - - - - - -- - -AA--  
         ATAT - - - -- - - - - - -- - -A --

and so on..

ADD REPLY
0
Entering edit mode

For paired-end sequencing, the central part is typically not sequenced. The average insert size is known (+/- some range), so the distance between the unique end and repeat end sequences is defined.

Note that the central part can be sequenced if the insert size is less than the length of the end reads. Then the reads overlap, and can be merged into a single longer read.

ADD REPLY

Login before adding your answer.

Traffic: 1882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6