Hello everyone, Sorry for post this naive sequencing question. I aligned a paired DNA sample to hg19 by bowtie2 with parameter "-q -N 1 -X 2000 --no-mixed --no-discordant". I saw some strange read pairs in my bam file like below (I deleted some ATCG for illustration). read1 can either be forward and reverse. This confused me.
(1)read1 is forward, chr1:10033-10110
reference CCCTAACCCAACCCTAACCCAA
read1 CCCTAACCCAACCCTAACCC
read2 CTACCCCAACCCTAACCCTA
(2)read1 is reverse, chr1:10018-10095
reference TAACCCAACCCTAACC
read1 ACCCAACCCTAACC
read2 TAACCCAACCCTAA
My question is :
(1) bowtie2 use --fr in alignment procedure by default. Is this means read1 should come from forward strand and read2 from reverse strand?
(2) How the above phenomenon occured?
(3) If this is normal, how to determine the start and end of the DNA fragment? which is 5' and which is 3'?
Thanks everyone!
For paired end sequencing you are sequencing from both ends of a DNA fragment. If the fragment size is small, the reads can overlap each other. For example, if your fragment length is 100 bases, and you are sequencing 75 bases from each end, the forward and reverse reads will have 50 bases of overlap in the middle.
Yes, I know this. But why read1 can either be from forward and reverse strand. I think that read1 should be the leftmost segment of the DNA fragment.
In this situation, you mean that the DNA fragment should be "ACCCAACCCTAA". But adapters has already been removed, therefore I think the DNA fragment should be "TAACCCAACCCTAACC".