Entering edit mode
4.8 years ago
Susmita Mandal
▴
110
Hello,
I have a paired end data and I have an alignment file (SAM/BAM). I want to split it into forward and reverse strand w.r.t reference genome. Looking at the SAM file, I am getting reads having 83,163,99 and 147 flags. So which flags I should consider for forward and reverse? Please help me out.
Thanks,
Susmita
I did looked at it and many other web pages and I'm more confused now
It is quite simply actually once you get the logic. Flag 16 means the read is reverse strand.
Therefore:
-F 16
means "do not use those that are reverse strand, so "use only forward", and-f 16
means "use reverse strand".samtools
is smart, so these flags alone are sufficient to filter by strand. Your flags, e.g. 83 contain additional information but this is not required here. The minimal flags I provided should do the trick.The thing is I have tried this command and I'm getting overlapped reads between the two strands and it should not happen. I'm doing allele specific variant calling. I should say the rna-seq is not strand-specific, would that be an issue?
What do you mean by "overlapping"? Well yeah, non-strand specific would be a problem for strand-specific analysis ;-)
Well I'm doing by creating personalized genomes (mouse strains). By overlapping I mean, lets take Xist and Tsix, antisense to each other, but most of the reads are coming from the same allele in these genes and other genes too. Could it be a case of allelle mapping bias?
Hmm, I think to answer this you really need a stranded library prep.
So there is no way to avoid that, in an unstranded RNA-Seq? :(
Are you sure this is unstranded data? Now a days people rarely do unstranded preps. Are you using publicly available data and are just not sure if the prep is stranded?
I am very sure. It's not publicly available data.