Question

STAR split reads into two strands

0

Entering edit mode

3.4 years ago

Alexander • 0

Hi all,

I'm trying to find a way to split reads mapped with STAR into two separate bam-files, one for each strand. STAR is able to do that when creating bedgraph or bigwig outputs (--outWigType and --outWigStrand), however I can not find any option to split raw reads into two separate files correponding to the strand they were transcribed from. I do not want to get counts per gene, I know how to do that, instead I need to get actual reads for each strand. Any ideas?

Thanks a lot in advance

rnachip stranded split STAR • 3.4k views

ADD COMMENT • link 3.4 years ago by Alexander • 0

0

Entering edit mode

Based on strand of genome, or transcript?

ADD REPLY • link 3.4 years ago by swbarnes2 14k

0

Entering edit mode

Based on strand of genome

ADD REPLY • link 3.4 years ago by Alexander • 0

0

Entering edit mode

Why can't you split on the binary flags with samtools?

ADD REPLY • link 3.4 years ago by swbarnes2 14k

0

Entering edit mode

Because this will give me 50/50 split. 50% of all reads will map to plus strand and another 50% will map to minus strand, but what I care about is from which strand of the genome the transcript came from. May be I was not clear?

ADD REPLY • link 3.4 years ago by Alexander • 0

1

Entering edit mode

You can filter for both alignment direction and which read of the pair you are looking at.

ADD REPLY • link 3.4 years ago by swbarnes2 14k

0

Entering edit mode

Hmmm.. do you mean flags for first in pair and second in pair? Yea, I thought about scripting this, but decided to check if there is a ready solution. Is it such a rare thing to do that no available tool offer such a function?

ADD REPLY • link 3.4 years ago by Alexander • 0

1

Entering edit mode

I'm not sure why you need a script. samtools view -f 80 will, for instance, get all the R1 reads that map to your genome in reverse. Isn't that one of the categories you want?

ADD REPLY • link 3.4 years ago by swbarnes2 14k

0

Entering edit mode

Well... technically yes, but I probably want to check that R1 is indeed downstream of R2 by the coordinates with this filtering, just to be on the safe side. May be I'm overcomplicating things :)

ADD REPLY • link 3.4 years ago by Alexander • 0

1

Entering edit mode

Okay, won't f -82 do that for you, by demanding proper pairing? Is this a non-model organism, so you can't just look up the orientation of each gene, instead of deducing it from reads?

ADD REPLY • link 3.4 years ago by swbarnes2 14k

0

Entering edit mode

Sounds good! Thanks a lot for the ideas! Good discussion always helps :) I will do first filtering for -f 2 and then do further filtering steps on the outcome file and put it together.

ADD REPLY • link 3.4 years ago by Alexander • 0

score 0 · Answer 1 · 2021-08-15

0

Entering edit mode

3.4 years ago

Divon ▴ 230

You can use my Genozip program:

genozip myfile.bam
genocat myfile.bam.genozip --FLAG -0x10 --output myfile.forward-strand-reads.bam
genocat myfile.bam.genozip --FLAG +0x10 --output myfile.reverse-strand-reads.bam

See here: https://genozip.com

Paper: https://www.researchgate.net/publication/349347156_Genozip_-_A_Universal_Extensible_Genomic_Data_Compressor

ADD COMMENT • link 3.4 years ago by Divon ▴ 230

0

Entering edit mode

Thanks. Hm... but how different is it from usng samtools tags? Would not it give me 50/50 split for all reads in the bam file?

ADD REPLY • link 3.4 years ago by Alexander • 0

0

Entering edit mode

It will give you the strand of the reference genome that your read maps to. For example, if your entire data is from a single RNA molecule, then it will all map to the same strand. Whether that is actually the strand your RNA came from in the DNA of the individual you sampled from, depends on whether their DNA matches the reference genome in that particular region.

ADD REPLY • link 3.4 years ago by Divon ▴ 230

0

Entering edit mode

It will give you the strand of the reference genome that your read maps to

Yes, but in paired end sequencing one read will always map to the opposite strand, regardless of the transcript origin, so I will get 50/50 split. What I care about is that both mates for read pairs that came from transcript produced in sense orientation would be labeled as originating from plus strand, and both mates from read pairs that came from anti-sense transcript would be labeled as originating from minus strand.

ADD REPLY • link 3.4 years ago by Alexander • 0