I am studying some RNA-seq data with edgeR
and limma
. The data was obtained from 59 samples using Illumina HiSeq 2500 and analyzing following the workflow RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR .
The data is quite complex with 4 different factors interacting between them we don't see a batch effect on the data. I have made 62 comparisons, some of them have differentially expressed genes (DEG) with very high log2 fold changes, so I think I made a mistake somewhere upstream.
On one approach with a model with the the intercept and 9 coefficients, the highest log2FC value is 40 which means a fold change of 1 099 511 627 776. However, the log fold change is not influenced by the average expression, the genes are already filtered by low median expression. I have some comparisons with up to 66% of differentially expressed genes (DEG) while other comparisons don't have any DEG.
Another approach (without using any model, just using the samples we want to compare) resulted on a fold change of 2609 and some comparisons up to 75% being DEG. Again I don't observe any influence of the average expression in the fold change.
14 comparisons for both methods don't have any DEG, so I don't think that the either of methods are wrong. My question is related to those high values of fold change. Below I provide an example comparing the log2 fold changes for a comparison. Is there some expected maximum or are these values normal? Any advice?
Thanks in advance.
Maybe you also need to filter by a minimal expression level, if you have a gene expressed in one condition as 20 cpm but in the control condition with 0.00001 cpm, that is a 2,000,000X fold-change.
Also, filter your DEG by FDR besides FC.
I filtered using
edgeR::filterByExpr
, that is min.count = 10 and min.total.count = 15. But I haven't looked at cpm, I'll look at it. When I mention the 75% of DEG is looking only to the FDR < 0.05.check if your normalization is not reducing values to zero, a quick box-plot for all your samples can visualize that
I don't think it is. You can find the boxplot here, also here after voom normalization