Should Rna-Seq Reads Map To One Strand Of Cdna Reference?
3
3
Entering edit mode
11.0 years ago
chjiao3456 ▴ 40

Hi everyone, I'm mapping mRNA-seq data (single end, 40bp) of C.elegans to the cDNA reference (junction of exons) with bowtie, I'm a beginner for RNA-seq data analysis, so I didn't use Tophat or other specific tools for RNA-seq. The cDNA reference is downloaded from Ensembl dataset.

I was wondering that should the RNA-seq reads only be mapped to one strand of the reference (either +strand or - strand since the sequence of mRNA is same as the cDNA reference)? However, my mapping result shows that reads could be mapped to both strands, shall I eliminate the mapped reads on one strand because there may be contaminations? Thanks.

rna-seq gene-expression • 7.7k views
ADD COMMENT
6
Entering edit mode
11.0 years ago

You need a strand-specific wet lab protocol in order to have strand-specific mappings. In most cases (at least with the data I have worked with), RNA-Seq protocols are typically not strand specific. You'll have to check with the core / company that produced your data.

No matter what, you will need to use some sort of short read aligner. TopHat or STAR would be the most typical solution, where you would align to the reference genome. If you use a set of transcripts (instead of chromosomes), you could use Bowtie + eXpress. However, I don't think one solution is really easier than the other.

ADD COMMENT
0
Entering edit mode

Thanks, I have read papers for strand-specific RNA-seq, and I think my data does not contain strand information.

ADD REPLY
5
Entering edit mode
11.0 years ago

I am not sure if I understand what is your reference, but even for strand-specific protocol you will get reads mapping to both strands - all reads coming from genes on + strand will have first read mapped on one strand and all reads coming from genes on - strand will have first read mapped on opposite strand.

In other words, if I color read pair by strand to which first read is mapped, vast majority of my exons will have one color. This coloring is one of the options in igv after right click. In such cases, people usually destroy reads mapping in opposite direction, but that's just because having 50 pink pairs and 2 blue pairs suggest probably some kind of error and not transcription going on opposite strand.

But as cwarden45 said, you might have non-strand specific data and in that case your reads will map to both strands - no matter if your gene of interest is on + or - strand.

ADD COMMENT
0
Entering edit mode

Thanks for your answer.

ADD REPLY
3
Entering edit mode
11.0 years ago

No. cDNA is double-stranded; when you make it, you lose all evidence of what the original RNA strand was.

There are tricky library preps that will make the fragments so that the original strand can be figured out from the direction of the Illumina reads, but you should in general figure that an unknown RNAseq sample not prepped that way, unless you are specifically told that the library prep was strand specific.

Bowtie to a list of transcripts is probably fine, if all you want is a simple count of how many reads hit each transcript. Samtools idxstats would be a very simple way of generating those counts, and as that output has the length of the transcript in it, it's simple to convert that file by itself into RPKM

ADD COMMENT
0
Entering edit mode

Thanks, I'm using samtools now.

ADD REPLY

Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6