Let us have a look at r001/1
and r001/2
from the example in http://samtools.sourceforge.net/SAM1.pdf .
Alignment example given at the beginning:
Coor 12345678901234 5678901234567890123456789012345
ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
+r001/1 TTAGATAAAGGATA*CTG
+r002 aaaAGATAA*GGATA
+r003 gcctaAGCTAA
+r004 ATAGCT..............TCAGC
-r003 ttagctTAGGC
-r001/2 CAGCGGCAT
Is later specified in the SAM format as:
@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:45
r001 163 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *
r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA *
r003 0 ref 9 30 5S6M * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0;
r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC *
r003 2064 ref 29 17 6H5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1;
r001 83 ref 37 30 9M = 7 -39 CAGCGGCAT * NM:i:1
The FLAG
field is set to 163 = 1010,0011(binary)
and 83 = 0101,0011(binary)
, respectively. In the first flag the bit 0x80
is set, and in the second -- the bit 0x40
(among others).
Now, according to the specification (p.4) this means r001/1
is the "last segment in the template", and r001/2
is the "first segment in the template". But apparently it is the other way around.
It seems that r001/1
is located in the first file and it aligns to the forward strand. Why is it labeled as "second in the template"
What do you say?
Indeed 163 is the second in pair while 83 is the first in pair. See http://picard.sourceforge.net/explain-flags.html
Thanks for the link. But in r001/1 comes first in the alignment, and r001/2 comes the second, their names /1 and /2 also indicate that r001/1 is the first, r001/2 is the second. Why then the bits are set to 163 and 83? It should be the other way around, i.e. I dare to say there is an error in the documentation.
Perhaps it aligned to the reverse strand, and hence the ordering swap.