Question

Confused About Solid Rna-Seq Pair End Data

1

Entering edit mode

12.8 years ago

Conan ▴ 20

Hi everyone, I'm new to RNA-Seq data analysis and there're some problems about the strand specific information that confused me so much, any suggestions would be greatly appreciated here.

Say I have SOLiD RNA Sequencing pair end data, which is 50 x 35 bp, and the library is built with strand specificity. I use the Tophat to map reads with the parameter "--library-type fr-secondstrand" and get the accepted_hits.bam.

Now I want to see if there's transcripts transcribed from the antisense strand. I mean if a gene lie in the forward chromosome, I wanna see if there's some reads mapping to the reverse strand that could possibly be the transcripts transcribed reversely.

For this purpose I should extract the reads mapping to the two strands separately and then compare them. But I have some questions below:

Is the SOLiD pair end, the F3/F5 reads in a pair mapping to the different strand? I mean if a gene lie in forward strand, is F3(+) and F5(-), and F3(-)/F5(+) mapping to the gene lie in reverse strand? I read the SOLiD protocol also examine my bam file in IGV confirmed it would be like this, but I also saw one thread http://seqanswers.com/forums/showthread.php?t=6317 the last post said the F3 and F5 reads in pair are actually on the same strand, so I'm not sure which one is correct. Any suggestion, discussion or comment will be welcomed. Thanks! 2.Although I used the parameter "--library-type" in Tophat mapping but I still don't know the manual's explanation about the 3 library type parameter. Anyone can explain clearly to me? Thanks.

3.It is said the XS:A tag indeed infers from which strand the read comes from. But in my data both the F3 and F5 reads are XS:A:+ if they were mapping to a gene lie in forward strand, with F3 shows + and F5 shows -. So I'm wondering the XS:A tag just told us the gene orientation, or I make some mistakes in some procedures?

solid • 5.5k views

ADD COMMENT • link updated 12.8 years ago by Istvan Albert 102k • written 12.8 years ago by Conan ▴ 20

score 2 · Answer 1 · 2012-07-09

2

Entering edit mode

12.8 years ago

Istvan Albert 102k

The names F3 F5 indicate where the fragments come from. In this protocol both mates come from the same strand.

You might also want to read this:

http://www.biostars.org/post/show/9063/paired-end-mapping-what-is-bwa-solid-paired-end-default-direction-bwa-sampe/

Note that the most common paired-end protocols produce F3, R3 reads and most tools expect that. It is almost certain that you would need to make sure that the tool supports the data in your format and then you have to explicitly invoke this custom behavior. The tools cannot detect this.

Alternatively you can just reverse complement your color space reads (that means reverting the colors) alas this too has some implications that can bite later on.

http://www.biostars.org/post/show/43855/transforming-and-manipulating-color-space-reads/

Long story short: kind of tedious.

ADD COMMENT • link 12.8 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you for the excellent explanation Albert, as I figured out the F3 and F5 it seems that my mapping result has something strange: As I said I use Tophat to map SOLiD RNA-Seq PE data with --library-type fr-secondstrand, then I use IGV to check out the accepted.bam, most of the properly paired reads shows: F3 start alignment (+) and F5 start alignment (-) if they were mapped to a gene lie on the forward strand. I also grep a specific properly paired reads to see the FLAG which is 99 and 147, means the read second in pair is mapped to the reverse strand. So I still need to be make sure that:

Isn't the alignment +/- in IGV means the positive strand and negative strand the reads mapped to? So this is contradictory with the "F3 and F5 fragments come from the same strand". So I don't know is myself stuck into the chaos or that's my data's problem.

ADD REPLY • link 12.8 years ago by Conan ▴ 20