Single-end strand specific Rna-Seq: how to separate alignments by strand
2
2
Entering edit mode
8.8 years ago
termanini ▴ 20

Hello, I have a single-end strand-specific RNA-seq data (Illumina protocol), and I would like to split the bam file in two parts by strand. For the alignment I am using TopHat 2.

I see this post: C: How To Separate Illumina Based Strand Specific Rna-Seq Alignments By Strand

However I don't understand how to apply that script to my data since my SAM flags are different (filtering with samtools view -b -f 128 -F 16 etc. give me an empty file).

samtools view sample.bam | cut -f2 | sort | uniq

0x0     (0)
0x10    (16 r   read reverse strand)
0x100   (256    s   not primary alignment)
0x110   (272    rs  read reverse strand + not primary alignment)

Can anyone help me?

Thanks very much!

Best, Alberto

RNA-Seq alignment sequencing strand • 6.6k views
ADD COMMENT
4
Entering edit mode
8.8 years ago

samtools view -f 16 ... will yield reads originating from the + strand.

samtools view -F 16 ... will do the same for reads originating from the - strand.

If you look at Istvan's script, just pay attention to the "first in pair" parts and ignore any bits > 16.

ADD COMMENT
0
Entering edit mode

Hi Devon, I'm trying to understand FLAG and strands of alignments and it's getting confusing. I asked a question two months ago here A: SAM flag and select reads that map uniquely and you answered:

if bit 16 is set, then then the read is mapped as a reverse complement (thus, if it's unset, then it's not mapped as a reverse complement). Whether "mapped as a reverse complement" can be taken to mean "maps to the minus strand" will be specific to how the library was prepared.

here you used -f 16 for reads originating from the + strand and it infers that -f 0 is will yield reads originating from the - strand, which is the same thing you mentioned in your comment in that post. This only applies to illumina TruSeq stranded RNA-seq protocol, right? besides stranded RNA-seq, is there any other application of sequencing technologies that use stranded library prep? And is there cases where -f 0 will yield reads originating from the + strand?

ADD REPLY
1
Entering edit mode

-f 0 will probably return every alignment in the file, since you're doing flag & 0 == 0, which should always be true. The opposite of -f 16 is -F 16. -F 16 is doing a flag | 16 > 0, which will return true only if bit 16 is set. These are bit operations, they're not directly comparing numbers like flag == 0.

BTW, BSseq also uses strand information, though it gets much more complicated than with RNAseq. You also sometimes see strand-specific ChIP experiments where some factor is binding a single strand of DNA (i.e., not your typical histone or transcription factor ChIP experiment). Those will work the same as RNAseq. Also, yes the flag stuff is specific to the current TruSeq and related protocols that are dUTP-based. For some older stuff, everything is swapped.

ADD REPLY
0
Entering edit mode

I think I was confused with numeric comparison from bit operations.. -f 0 and -F 0 both return all the alignments in the bam file. Yeah, BS-seq strand is so confusing.. i need to spend more time to understand that when it comes up...

ADD REPLY
0
Entering edit mode
8.8 years ago
termanini ▴ 20

Thanks very much Devon,

I will try as you suggested! Just a curiosity: do you know if the flag XS:A+ and XS:A:- assigned by TopHat can be used in order to separate by strands?

Thanks again! Alberto

ADD COMMENT
1
Entering edit mode

If your aligner happens to add them (they're custom tags only used by tophat2 (and STAR if you ask it to)) then you could alternatively use them. Note that it's faster to do what I wrote than to use those tags.

ADD REPLY

Login before adding your answer.

Traffic: 1806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6