Question

Paired-end reads - what is what?

17

Entering edit mode

9.9 years ago

darxsys ▴ 240

This is probably a dummy question, but I haven't found an explanation and I want to be sure. When paired-end sequencing is done, there are usually 2 fastq files generated, one with "left" mates and the other with "right" mates.

Assuming that mates in the first file are read from the forward strand, are then mates in the second file read from the reverse strand?

If so, what is exactly written in the second file? A reverse complement of the read sequence (the corresponding sequence on the forward strand) or the pure reverse-strand sequence? And in which direction?

RNA-Seq • 34k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by darxsys ▴ 240

Ram · Accepted Answer · 2015-06-08

38

Entering edit mode

9.9 years ago

thackl ★ 3.0k

Illumina paired-end sequencing is based on the idea that you have initial DNA fragments (longer than your actual read length) and you sequence both its ends. On the Illumina chip, both ends of each sequence are amplified prior to actual sequencing using bridging. This approach results in two reads per fragment, with the first read in forward orientation and the second read in reverse-complement orientation. Depending on the inital fragment size and read length, these fragment can either overlap or not

For example, with 100 bp reads:

------[f-180bp]----->
----[r1]----->
        <----[r2]----

------------------[f-350bp]----------------->
----[r1]----->
                                <----[r2]----

Therefore, the first fastq file will contain all "r1" reads, the second file all "r2" reads.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by thackl ★ 3.0k

2

Entering edit mode

So in the second file, if it says: TTCAG, that actually corresponds to CTGAA in the forward strand?

ADD REPLY • link 9.9 years ago by darxsys ▴ 240

5

Entering edit mode

Yes, to have r1 and r2 in the same orientation, you need to reverse-complement one of the two reads. Keep in mind though, you usually do not know, which of the two reads corresponds to the forward strand of your initial template. Although, forward strand is just a definition anyways, usually based on transcription direction.

For double-stranded genomic DNA there is transcribed stuff on both strands. In standard protocols the generation of fragments is random, as is the strand they originate from. I actually do not know, if there is a rule used by the Illumina machine to decides which read to output as r1 and which as r2.

For RNAs, there obviously is a forward strand. But only in case of strand-specific RNA-seq you actually know how your reads are oriented with respect to your initial template.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by thackl ★ 3.0k