Hello, I have a single-end strand-specific RNA-seq data (Illumina protocol), and I would like to split the bam file in two parts by strand. For the alignment I am using TopHat 2.
I see this post: C: How To Separate Illumina Based Strand Specific Rna-Seq Alignments By Strand
However I don't understand how to apply that script to my data since my SAM flags are different (filtering with samtools view -b -f 128 -F 16 etc. give me an empty file).
samtools view sample.bam | cut -f2 | sort | uniq
0x0 (0)
0x10 (16 r read reverse strand)
0x100 (256 s not primary alignment)
0x110 (272 rs read reverse strand + not primary alignment)
Can anyone help me?
Thanks very much!
Best, Alberto
Hi Devon, I'm trying to understand FLAG and strands of alignments and it's getting confusing. I asked a question two months ago here A: SAM flag and select reads that map uniquely and you answered:
if bit 16 is set, then then the read is mapped as a reverse complement (thus, if it's unset, then it's not mapped as a reverse complement). Whether "mapped as a reverse complement" can be taken to mean "maps to the minus strand" will be specific to how the library was prepared.
here you used -f 16 for reads originating from the + strand and it infers that -f 0 is will yield reads originating from the - strand, which is the same thing you mentioned in your comment in that post. This only applies to illumina TruSeq stranded RNA-seq protocol, right? besides stranded RNA-seq, is there any other application of sequencing technologies that use stranded library prep? And is there cases where -f 0 will yield reads originating from the + strand?
-f 0
will probably return every alignment in the file, since you're doingflag & 0 == 0
, which should always be true. The opposite of-f 16
is-F 16
.-F 16
is doing aflag | 16 > 0
, which will return true only if bit 16 is set. These are bit operations, they're not directly comparing numbers likeflag == 0
.BTW, BSseq also uses strand information, though it gets much more complicated than with RNAseq. You also sometimes see strand-specific ChIP experiments where some factor is binding a single strand of DNA (i.e., not your typical histone or transcription factor ChIP experiment). Those will work the same as RNAseq. Also, yes the flag stuff is specific to the current TruSeq and related protocols that are dUTP-based. For some older stuff, everything is swapped.
I think I was confused with numeric comparison from bit operations..
-f 0
and-F 0
both return all the alignments in the bam file. Yeah, BS-seq strand is so confusing.. i need to spend more time to understand that when it comes up...