How to handle RNA seq data with unbalanced % rRNA by group?
1
0
Entering edit mode
15 hours ago
ev97 ▴ 30

I have some RNAseq data (human cell line) with ~100m reads. We asked the company to perform ribodepletion and the first report that we received everything seemed okay.

However, when I run FastQ Screen, I got high % of rRNA in this data. According to this post and assuming that my data would be okay, I was expecting less than 10% rRNA in it (as little as 2-3%). However, I got more than that in some samples and surprisingly the amount of rRNA is not even the same in controls and condition. As you can see in the screenshot below, controls reach > 30% rRNA and the maximum % in the condition does not exceed the 11%

enter image description here

After checking some posts, I think that the best way to proceed (if I want to use this data) is to remove the rRNA, since there are several tools that allows you to do that.

However, I also saw the possibility to add the % rRNA as a covariate in my DE analysis (I will use DESeq2). but I do not know if it makes sense (or I will be able to do it) cause the differences are not homogeneous (I don't have 50% with and without rRNA in controls and conditions).

But... I was wondering if I proceed with the removal of rRNA, I should add a covariate that would be my "batch effect" as my samples will not have the same amount of reads -some will have less cause I had to remove more rRNA reads- (?) (or maybe DESeq2 already tackles this into account?).

Has anybody ever had a similar problem as mine? If so, what would you recommend? I would like to use this data (and of course obtain logic results)

Thanks very much in advance

fastqscreen rRNA DESeq2 RNAseq DGE • 68 views
ADD COMMENT
0
Entering edit mode
21 minutes ago
dsull ★ 6.9k

Just remove the rRNA. No need to add a covariate. You can consider those rRNA differences "biological variation".

I'd only use a covariate if the samples was processed in different ways (e.g. if you intentionally depleted rRNA in some samples but not others).

You can always include more (some ridiculous) covariates to "explain" variation in your model: bioanalyzer fragment size, rRNA content, flow cell, the shelf a particular sample of cells was on in your TC incubator (lol), etc.

But, at some point, you should just let those things be residual variance, unless you have strong evidence that you need to account for it.

If you're really concerned, make a PCA plot after you filter out the rRNA to see if initial rRNA content drives clustering. Or try both models (one w/ the covariate and one w/o).

ADD COMMENT

Login before adding your answer.

Traffic: 1175 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6