I am trying to understand the circumstances where partner strands of paired-end could be in the same direction. I always thought that the paired-end run idea is to sequence and cover the insert from two opposing ends and in opposite direction.
Particularly when both partner reads are aligned to the same chromosome as can be seen in the following example line from GAII ELAND sorted file:
awk -F'\t' '$14=="R" && $21=="R" && $18==""' s21_sorted.txt
HWUSI-EAS656 29 2 71 10638 1514 GGCTAC 1 GGATCAGACAGTGCCCTCAAAGAGCCATCTTCACAACAGTAGAGAGTAAG bbbbbbcbbbbbbbbbbbbbbabbcbbabbbbbbbbbabbccccab``ac hsrefchr11.fa 92108870 R 15T34 164 0 -3333 R
or,
awk -F'\t' '$14=="F" && $21=="F" && $18==""' s21_sorted.txt
HWUSI-EAS656 29 2 10 16626 9436 TGACCA 1 CTTTCCAGATACTAGGTTAAGCACTAGAGGTTAAGCACTGGATATACAAA bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcbbbbbbbbbbbbbbbbbbb hsrefchr11.fa 92105586 F 41GG7 125 0 3015 F
where $14, $21 and $18 are match strand, partner strand and partner chromosome, respectively.
Any help is appreciated.
Thanks a lot for your answers. If there are many incorrectly mapped reads, how useful is retaining them as single reads for downstream analysis? It looks like I see thousands of them.