In Illumina paired end sequencing, I am confused how pairs end up with different sequences that overlap
I will give a basic example of paired end sequencing results from this DNA strand:
5' ATTTGCCCGC 3'
3' TAAACGGGCG 5'
This DNA strand is 10bp long and for this example I will be doing 7 cycles so 7bp reads.
For the forward strands we would end up with the following DNA molecules bound to the flow cell (after bridge amplification and the reverse strands cleaved & washed): With the 3' end being exposed and the 5' end bound to flow cell
F1: 5' GCGGGCAAAT 3'
F2: 5' ATTTGCCCGC 3'
which would result in the following 7bp reads for F1 and F2:
F1: 3' CCGTTTA 5'
F2: 3' ACGGGCG 5'
Then, for the reverse strands we would end up with the following DNA molecules bound to the flow cell after the second round of bridge amplification and forward strands cleaved & washed:
R1: 5' ATTTGCCCGC 3'
R2: 5' GCGGGCAAAT 3'
which would result in the following 7bp reads for R1 and R2:
R1: 3' ACGGGCG 5'
R2: 3' CCGTTTA 5'
Therefore for my pairs, F1R2 and F2R1 I would end up with the following reads:
F1: 3' CCGTTTA 5'
R2: 3' CCGTTTA 5'
F2: 3' ACGGGCG 5'
R1: 3' ACGGGCG 5'
They are exactly identical. I know inn sequencing you would not expect F1R2 and F2R1 pairs to be identical, doing this example I assumed they would overlap, considering 7bp each and 10bp total insert size, they would overlap by 4bp and have 3bp either side not matching.
Have I got something wrong in my theory?
Thanks!
I made this drawing some times ago ( published in JOSS), it might be of help:
In real life one rarely wants sequences to overlap (special case libraries, short inserts etc). Having sequences overlap gives you a second read-out on the data but Illumina sequencing has become standard enough that one does not need to worry about technical replication any longer.
Graphic in this thread should prove useful: What is the difference between paired end reads and overlapping reads, and then why merge overlapping reads before assembly?
Edit: Sequencing always proceeds in a 5' --> 3' manner (on either strand) so that should be kept in mind.