I am confused by the orientation of reads in read pairs in a bam file I am looking at. I have aligned paired end Illumina read data with bwa mem -M
. The log states
[M::process] read 36146 sequences (2937732 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2961, 0, 0)
which I take to mean that bwa thinks all my pairs are FR, as I gather is to be expected from paired end sequencing (though I am confused that bwa reads 36146 sequences and only finds 2961 unique pairs). For downstream processing, I have since run samtools sort -n
and used the bbtools reformat.sh
to retain bit 3 and remove bit 3844, as well as update to sam format 1.4. There are now only reads left with bit flags 83, 99, 147, and 163, as confirmed by
samtools view "${bam}" | awk '{print $2}' | sort | uniq
Based on the two above facts, I'm therefore assuming that all my reads are FR, inward-facing. However, when I run samtools stats "${bam}"
, I see that
SN inward oriented pairs: 7028
SN outward oriented pairs: 4196
SN pairs with other orientation: 0
SN pairs on different chromosomes: 0
which confuses me. A bit of manual inspection seems to corroborate this. Running
samtools view "${bam}" |
awk 'BEGIN{print "id", "flag", "start", "end"} {split($1,a,"."); print a[2], $2, $4, $4+length($10)}'
I get e.g. (recall that these are sorted by name):
id flag start end
19 83 10768 10869
19 163 10760 10861
2117 83 10241 10342
2117 163 10285 10386
2654 99 11558 11657
2654 147 11539 11633
2893 83 133841665 133841766
2893 163 133841620 133841718
6029 83 176995687 176995788
6029 163 176995614 176995710
Flags 83 and 147 tell me that the read is reverse. Now, if the reads were FR, in paired end sequencing I would expect the reverse read to lie after the forward read, which is true for ids 19, 2893, and 6029, but not for ids 2117 or 2654, in accordance with the results from samtools stats
.
Based on the above, I am confused as to whether my read pairs are properly oriented. What is going on here?