I have some RNAseq data (human cell line) with ~100m reads. We asked the company to perform ribodepletion and the first report that we received everything seemed okay.
However, when I run FastQ Screen
, I got high % of rRNA in this data. According to this post and assuming that my data would be okay, I was expecting less than 10% rRNA in it (as little as 2-3%). However, I got more than that in some samples and surprisingly the amount of rRNA is not even the same in controls and condition. As you can see in the screenshot below, controls reach > 30% rRNA and the maximum % in the condition does not exceed the 11%
After checking some posts, I think that the best way to proceed (if I want to use this data) is to remove the rRNA, since there are several tools that allows you to do that.
However, I also saw the possibility to add the % rRNA as a covariate in my DE analysis (I will use DESeq2). but I do not know if it makes sense (or I will be able to do it) cause the differences are not homogeneous (I don't have 50% with and without rRNA in controls and conditions).
But... I was wondering if I proceed with the removal of rRNA, I should add a covariate that would be my "batch effect" as my samples will not have the same amount of reads -some will have less cause I had to remove more rRNA reads- (?) (or maybe DESeq2 already tackles this into account?).
Has anybody ever had a similar problem as mine? If so, what would you recommend? I would like to use this data (and of course obtain logic results)
Thanks very much in advance