Question

Differences in DEG results using RSEM with and without the --strandness reverse

0

Entering edit mode

7 weeks ago

Emy Alade • 0

Hello,

I am encountering an issue in my gene expression analysis with RSEM. Here are the details:

I am using pair-end reverse RNA-Seq data, where the alignment was performed using STAR. The results of DEG (differentially expressed genes) vary when I activate or do not activate the --strandness reverse parameter in RSEM.

Could someone explain the impact of this parameter on read assignment and gene counting? Should I specify --strandness reverse for my data ?

Thank you in advance for your insights!

rsem DEG STAR transcriptomic • 429 views

ADD COMMENT • link updated 7 weeks ago by LauferVA 4.7k • written 7 weeks ago by Emy Alade • 0

0

Entering edit mode

Before applying that parameter you should check your data with https://github.com/signalbash/how_are_we_stranded_here The "reverse" is most widely used yes. It's probably a good idea to use if your RNAseq is indeed stranded. You can also check the report from the sequencing facility

ADD REPLY • link 7 weeks ago by lagartija ▴ 160

score 2 · Answer 1 · 2025-03-24

Emy - Im glad you asked this because you have to get this right for the results to be valid. Below is a breakdown of what the flags mean, and what specifically goes wrong if not used.

Read Assignment to a Gene is based on Orientation:

Reverse-stranded libraries: In these protocols, reads are the reverse complement of the RNA. For instance, if a gene is transcribed from the plus strand, its reads will align to the minus strand.

Forward-stranded libraries: Reads are produced in the same orientation as the original RNA transcript, meaning they directly match the gene's annotated strand.

Consequences of Using the Wrong Flag: In essence, you're asking what happens if you assign the wrong flag to the data. Well, suppose RSEM incorrectly thinks reads should align to the reverse complement, but they actually SHOULD align in the fwd direction. What would Happen?

Some likely consquences:

Underestimation of True Expression: Say you specify --strandness reverse on forward or unstranded data. Now, all the things that should actually have mapped there .. well .. don't. Because the algorithm is looking for reads that match ATTATTA, but the reads you have provided are TAATAAT because the flag was mixed up.
Misassignment (Falsely Attributed OverExpression): Suppose a read matches the reverse complement of a given gene, but RSEM goes looking for an alignment to the forward strand. Then, in some cases, a match will be found, and that read will be incorrectly assigned to some other gene. Thus, that gene's expression may appear falsely higher than it actually is.
Multimapping Dilutes Signal: Using the wrong strand parameter will also create partial / ambiguous alignments (i.e., because reads might map poorly across multiple locations). Different bioinformatics algorithms then handle these multimapped reads in varied ways. Diffuse partial mapping can lead to a dilution of the overall signal / increased noise)

Net Effect: Adverse Impact on Differential Expression Analysis: Differential expression analysis relies on precise gene counts. Incorrect strand settings skew these counts, leading to unreliable fold-change calculations and potentially misleading biological conclusions.