I would like to understand how to interpret the SAM flags (relatively to strandness) for both strand-specific protocols and non-strand-specific protocols. Are the following making sense?
5'--------------------------------------------3'
read/1 --------> <--------- read/2
/1
has flag 99, /2
has flag 147
read/2 --> <--- read/1
/2
has flag 163, /1
has flag 83
How does this change when the protocol is strand-specific?
Thanks!
It's simply, when you turn RNA to DNA as part of the RNA-seq protocol, you lose the information about whether the RNA transcript came from the forward or reverse strand. So what you sequence in the sequencer could tell you where the RNA came from, but not if its AAATC or if its actually TTTAG and you just happened to have sequenced the other side of the cDNA.
Newer protocols chemically tag the second DNA strand synthesize by using dUracil instead of Thymine. Then right before PCR, all the strands with the Uracil are degraded enzymatically, so the result is you know not only where the RNA was made, but also which strand, forward or reverse.
The BAM file doesn't know or care about any of this :) It's up to you, if you used a strand-specific protocol, to do something meaningful with the read sequence and the flag information :)