Hi.
I'm working with RAD-sequencing where I'm specifically doing paired-end sequencing. I'm a bit confused, and I would greatly appreciate any answers to my questions.
According to the protocol I'm using, my understanding is that, prior to sequencing using Illumina, is that the fragments produced by PCR are size-selected i.e., cut down, to a uniform size of 500 bp. These 500 bp are then sequenced from both ends. The resulting sequenced fragments, the reads, are 150 bp each. My first question is: How does the sequencing initiate in the "sheared" end (reverse read, R2) of the fragment? For the forward read (R1), the P1 adapter which I assume is necessary for initiating sequencing, can bind to the restriction site, because the restriction site "overhang" sequence is known. But the "sheared end", i.e., from where the reverse-read will be made from, doesn't have a known sequence. How then is the "P2" adapter able to bind to the sheared (reverse) end of the sequence, to allow for sequencing?
The result from the sequencing, to my understanding, is that 150 bp are sequenced on each end of a 500 bp fragment. This creates an gap of 200 bp between the sequences. How then is it possible to match the forward read with the reverse read?
This question is related to question 2. When I finally have the reads, they are to be processed with trimmomatic. Here, I should use the paired option. But what's the point of it, if the reads can't be aligned to each other anyways (see question 2)? Wouldn't it be more logical to just concatenate the forward and reverse read of each sample into one, and treat them as single reads?
Thank you!
Thanks for your reply, but didn't really answer any of my questions. I have been trying to look up this information, but can't find it being explained anywhere, which is why I'm guessing why I'm missing something fundamental.
You are definitely missing something fundamental. You need to learn how next generation sequencing works but again; how the sequencing works is way outside the scope of the forum. You should watch the video mentioned by joe and look for similar ones describing the Illumina technology.
All of your questions are related to the sequencing methods and have nothing to do with RAD-seq. The barcode information and the way the sequencer stores/saves the base-calling/cluster information are how the read-pairing information is conserved. I strongly recommend you talk to somebody in your lab/work, who you think has strong knowledge of this topic, and ask them. If you are unaware of how the technology works you need to have a significant discussion that is much longer than is possible to type out.
Thank you. Ok, I'll rehearse my NGS and come back or go somewhere else if I can't figure it out.