I am so confused about this SAM file data.
SRR1972739.9835 131 AF086833 127 60 101M = 285 159 ...
SRR1972739.9835 67 AF086833 285 60 85M16S = 127 -159 ...
The one read has FLAG 131 and other read has 67. (FLAG 131 means that one pair is 'PAIRED,PROPER_PAIR,READ2' and FLAG 67 means that 'PAIRED,PROPER_PAIR,READ1' another is.)
I know that if one read is on the forward strand, the other should be on the reverse strand.
r1 -------> forward
===========================
reverse <-------- r2
So I think one read should have reverse-complement flag(0x10) and it is impossible for both pairs to have 0x10 (REVERSE) or 0x20 (MREVERSE) flag. Therefore, I think the alignment with FLAG
0x10 0x20
read1 0 1
read2 1 0
and
0x10 0x20
read1 1 0
read2 0 1
are possible and
0x10 0x20
read1 0 0
read2 0 0
and
0x10 0x20
read1 1 1
read2 1 1
are impossible. Am I wrong? Or is the data wrong? As I know, sometimes the mate sequence is attached to another chromosome, but I don't know how. I would appreciate it if you could give me the answer.
There's a few cases where you'll definitely want to check the alignments by hand. That said, I tend to regard "properly" with a bit of suspicion.
67/131 only trip read_is_paired/in_proper_pair and R1 | R2 with neither being "reverse strand", which has interesting implications about orientation. You're correct in that one or the other should be on the reverse strand but this clearly isn't marked here.
Also in addition to flag 67, do you see any flag 83 (read reverse) or 99 (mate reverse) ?
For my own data, lots of 83/163 (read paired, mapped in proper, reverse strand, first && read paired, mapped in proper, mate_reverse_second in pair) and 99/147 (paired/proper/reverse/second && paired/proper/mate_reverse/first); assembly with megahit, mapping with BWA.
I'm just parsing out the sam flags via
which will let you look at the full distribution of flags detected in the bam file along with counts. May want to check which sets of flags are the majority in your set?
I have calculated the counts of each FLAG pairs (supplementary and secondary alignment are not included) using python.
I can confirm that the mates which are proper' and having only one of the 'reverse' or 'mreverse' (flag 83/163, 99/147) are dominant but also see that there are still too many 'proper' but neither 'reverse' nor 'mreverse' pairs (flag 67/131).
.+ I used BWA-mem aligner with no optional parameters