Is The Example In The Sam Format Specification Documentation Incorrect?
1
1
Entering edit mode
12.5 years ago
ilonpilaaja ▴ 50

Let us have a look at r001/1 and r001/2 from the example in http://samtools.sourceforge.net/SAM1.pdf .

Alignment example given at the beginning:

Coor 12345678901234 5678901234567890123456789012345
ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
+r001/1 TTAGATAAAGGATA*CTG
+r002 aaaAGATAA*GGATA
+r003 gcctaAGCTAA
+r004 ATAGCT..............TCAGC
-r003 ttagctTAGGC
-r001/2 CAGCGGCAT

Is later specified in the SAM format as:

@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:45
r001 163 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *
r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA *
r003 0 ref 9 30 5S6M * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0;
r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC *
r003 2064 ref 29 17 6H5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1;
r001 83 ref 37 30 9M = 7 -39 CAGCGGCAT * NM:i:1

The FLAG field is set to 163 = 1010,0011(binary) and 83 = 0101,0011(binary), respectively. In the first flag the bit 0x80 is set, and in the second -- the bit 0x40 (among others).

Now, according to the specification (p.4) this means r001/1 is the "last segment in the template", and r001/2 is the "first segment in the template". But apparently it is the other way around.

It seems that r001/1 is located in the first file and it aligns to the forward strand. Why is it labeled as "second in the template"

What do you say?

sam • 3.2k views
ADD COMMENT
0
Entering edit mode

Indeed 163 is the second in pair while 83 is the first in pair. See http://picard.sourceforge.net/explain-flags.html

ADD REPLY
0
Entering edit mode

Thanks for the link. But in r001/1 comes first in the alignment, and r001/2 comes the second, their names /1 and /2 also indicate that r001/1 is the first, r001/2 is the second. Why then the bits are set to 163 and 83? It should be the other way around, i.e. I dare to say there is an error in the documentation.

ADD REPLY
0
Entering edit mode

Perhaps it aligned to the reverse strand, and hence the ordering swap.

ADD REPLY
1
Entering edit mode
11.1 years ago
samtools view -S  -X examples/toy.sam  | grep r001

r001    pPR2    ref    7    30    8M4I4M1D3M    =    37    39    TTAGATAAAGAGGATACTG    *    XX:B:S,12561,2,20,112
r001    pPr1    ref    37    30    9M    =    7    -39    CAGCGCCAT    *
  • r001 (83) is read paired , read mapped in proper pair, read reverse strand , first in pair
  • r001 (163) is read paired, read mapped in proper pair, mate reverse, strand second in pair

first in pair: is not the position in the alignment, it 's he type of FASTQ in a paired-alignment = forward sequencing of the fragment

second in pair: is not the position in the alignment, it's the type of FASTQ in a paired-alignment = reverse sequencing of the fragment

ADD COMMENT
0
Entering edit mode

what parameters/tool did you use to align? I can't get it with neither bwa mem or bwa aln, and your cigar 8M4I4M1D3M is different from the one reported in the alignment 8M2I4M1D3M

(it is an interesting example that shows how is not that easy to reproduce exactly even an toy example)

also doesn't first in pair mean that the read was stored in file1 and reads in file1 have /1

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6