Hello,
I have RNA-seq data from two different treatment groups (F and NF ) at 2 different time points (T1 and T2). The mapping was done with STAR aligner and the quantification was done with FeatureCounts. When I perform the paired analysis in F: T2 vs T1 and NF: T2 vs T1. I create a design matrix including the variables Individ+medianIsize+GC+RIN+Plate from my metadata file which looks like:
Individ age sex fasting timePoint medianIsize GC BMI BMI_category extraction_batch_ID extraction_date plate RIN
001_T2 ID001 54 F F T2 442 59.10 32.67 4 101 31/07/19 9 8.4
002_T2 ID002 65 F F T2 434 57.20 31.80 4 101 31/07/19 9 8.9
003_T2 ID003 64 F F T2 423 58.17 29.47 3 100 31/07/19 8 7.5
004_T2 ID004 65 F F T2 400 54.90 28.00 3 100 31/07/19 8 8.5
005_T2 ID005 56 F F T2 381 54.77 21.71 2 99 30/07/19 8 9.1
006_T2 ID006 53 F F T2 450 53.18 32.85 4 63 22/04/19 5 9.5
My count matrix looks like
001_T2 002_T2 003_T2 004_T2 005_T2 006_T2 007_T2 008_T2 009_T2 010_T2 011_T2 013_T2 014_T2 015_T2 016_T2 017_T2 018_T2
ENSG00000186827 68 248 250 269 143 268 253 293 144 117 183 145 263 334 180 211 197
ENSG00000186891 61 130 185 111 164 120 139 341 111 156 147 95 253 349 121 281 153
In paired analysis limma-voom returns 3302 DEGS (padjust < 0,05) out of 17231 genes for F: T2 vs T1 and 3844 for NF: T2 vs T1.
When I am trying to perform unpaired analysis for comparisons of treatment groups within time points ie T1 : F vs NF or T2: F vs NF, I create a design matrix with different combinations of the covariates: medianIsize + GC + age + sex + BMI+ RIN. In each combination of covariates limma returns very few DEGs (7-32). From a biological standpoint, we expect a large number of DEGs for these comparisons. Below is the table of different versions of covariates and results of limma :
Input Covariates T1: F vs NF T2: F vs NF
medianIsize + GC 14096 14041
medianIsize + GC + age + sex 5 9
medianIsize + GC + age + sex + BMI 7 10
GC + age + sex + BMI+ RIN 30 13
GC + age + sex + BMI 7 12
medianIsize + GC + age + sex + BMI+ RIN 32 10
medianIsize + GC + age + sex + BMI+plate 7 9
medianIsize + GC + age + sex + BMI+RIN+plate 13 10
age + sex + BMI+RIN+plate 13 6
medianIsize + GC +sex+ RIN+plate 311 156
Since I was expecting more DEGs in unpaired analysis (T1: F vs NF, T2: F vs NF) do you think that this kind of result makes sense based on the combination of covariates that we use? Is it possible the combination of covariates to create a model matrix that is not proper for the limma algorithm in such a way that restricts the number of DEGs produced?
Cross-posted to https://support.bioconductor.org/p/9144454/