Question

SAM Flags for illumina paired-end reads mapped on transcriptome

1

Entering edit mode

7.8 years ago

Matteo Schiavinato ★ 3.7k

I am mapping illumina paired-end reads against a transcriptome with HISAT2 and I'm having a moment of confusion understanding one thing which might be easy actually:

Is it normal to have sam flags that point at the same strand for both reads? The technology output should produce one read that maps on the forward and one read that maps on the reverse strand (when mapping against a genome) but I'm uncertain of what I should observe when mapping against a transcriptome.

I guess the program should map one of the two mates and reverse-complement the other, but does the output flag (and mapping result in general) reflect the mapping strand of the reverse-complement or of the original read? is it normal to have many pairs with both mates on the same strand?

Please help me out of this "theoretical" quicksand.

sam flags transcriptome alignment paired-end • 2.9k views

ADD COMMENT • link updated 7.8 years ago by Devon Ryan 105k • written 7.8 years ago by Matteo Schiavinato ★ 3.7k

score 1 · Accepted Answer · 2017-09-13

1

Entering edit mode

7.8 years ago

Devon Ryan 105k

It doesn't matter what you're aligning against, you expect PE mates to align with opposite orientations. The strandedness of the underlying data plays absolutely no role in this (your reads don't have strands they have orientations). Unless a very atypical library prep. was used, the flags you should commonly see in the resulting BAM file are 99, 147, 83, and 163. If those don't constitute the overwhelming majority of the flags then something likely went amiss.

ADD COMMENT • link 7.8 years ago by Devon Ryan 105k

0

Entering edit mode

8 weeks later, the problem rises again in a different form: would you keep 83 and 163? they are the ONLY flags I have after filtering. I find it suspicious because my reads should face each other, they shouldn't map as mate pairs.

ADD REPLY • link 7.7 years ago by Matteo Schiavinato ★ 3.7k

0

Entering edit mode

83 and 163 are facing each other (as are 99 and 147). Having only this combination makes sense if you're aligning against a transcriptome and have strand-specific data.

ADD REPLY • link 7.7 years ago by Devon Ryan 105k