Entering edit mode
4.9 years ago
Ati
▴
50
I have used CollectRnaSeqMetrics
to check the numbers of reads with the correct/incorrect strand designation. The results showed that more than 80% of reads in almost all of the samples are mapped to the incorrect strand!
What would be a good explanation for these results?
Thank you in advance!
Every RNAseq kit out there does not capture the reverse strand. There are some that capture coding strand as well. Have you checked into what kind of kit was used?
I'm not sure about the exact kit but it was a stranded RNA-seq library prep kit.
Make sure to set the right flags and use the tool correctly, depending on the library prep either the second in pair or the first in pair will indicate the correct strand. So you can easily get the opposite result.
Do visualize your data as well.
Thanks! As my data is paired I have specified
STRAND=SECOND_READ_TRANSCRIPTION_STRAND
. That's why I'm a bit surprise with the results!That setting has nothing do with the data being paired-end. You need to identify which strand was captured by library prep and got sequenced.
See this image: A: Tophat Library-Type : Illumina Truseq Stranded Total Rna Sample Prep Kit
Plus the nomenclature is non standardized and quite confusing. Different tools may designate the same protocol with different names.
For example, the TrueSeq Illumina protocol when analyzed with TopHat you need to specify
fr-firststrand
designation, but as a matter of fact, in the read pair, the second in pair will match in the original orientation.That's now clear to me! Thank you!