Hi everyone, I'm new to RNA-Seq data analysis and there're some problems about the strand specific information that confused me so much, any suggestions would be greatly appreciated here.
Say I have SOLiD RNA Sequencing pair end data, which is 50 x 35 bp, and the library is built with strand specificity. I use the Tophat to map reads with the parameter "--library-type fr-secondstrand" and get the accepted_hits.bam.
Now I want to see if there's transcripts transcribed from the antisense strand. I mean if a gene lie in the forward chromosome, I wanna see if there's some reads mapping to the reverse strand that could possibly be the transcripts transcribed reversely.
For this purpose I should extract the reads mapping to the two strands separately and then compare them. But I have some questions below:
- Is the SOLiD pair end, the F3/F5 reads in a pair mapping to the different strand? I mean if a gene lie in forward strand, is F3(+) and F5(-), and F3(-)/F5(+) mapping to the gene lie in reverse strand? I read the SOLiD protocol also examine my bam file in IGV confirmed it would be like this, but I also saw one thread http://seqanswers.com/forums/showthread.php?t=6317 the last post said the F3 and F5 reads in pair are actually on the same strand, so I'm not sure which one is correct. Any suggestion, discussion or comment will be welcomed. Thanks! 2.Although I used the parameter "--library-type" in Tophat mapping but I still don't know the manual's explanation about the 3 library type parameter. Anyone can explain clearly to me? Thanks.
3.It is said the XS:A tag indeed infers from which strand the read comes from. But in my data both the F3 and F5 reads are XS:A:+ if they were mapping to a gene lie in forward strand, with F3 shows + and F5 shows -. So I'm wondering the XS:A tag just told us the gene orientation, or I make some mistakes in some procedures?
Thank you for the excellent explanation Albert, as I figured out the F3 and F5 it seems that my mapping result has something strange: As I said I use Tophat to map SOLiD RNA-Seq PE data with --library-type fr-secondstrand, then I use IGV to check out the accepted.bam, most of the properly paired reads shows: F3 start alignment (+) and F5 start alignment (-) if they were mapped to a gene lie on the forward strand. I also grep a specific properly paired reads to see the FLAG which is 99 and 147, means the read second in pair is mapped to the reverse strand. So I still need to be make sure that:
Isn't the alignment +/- in IGV means the positive strand and negative strand the reads mapped to? So this is contradictory with the "F3 and F5 fragments come from the same strand". So I don't know is myself stuck into the chaos or that's my data's problem.