Hello,
Suppose I have RNA-seq data for 1) control, say, T0
2) treatment after 4 hours T4
3) treatment after 8 hours T8
and I would like to find out those genes that are differentially expressed between each of these pairs (where T0 vs T4 and T0 vs T8 are most informative/essential to the experimenter).
I perform normalization using edgeR
TMM method. However, the way I have been doing it is to normalize count data for each pair (A)
. That is, for T0 vs T4, I obtain the counts and then perform the TMM normalization and then obtain the candidate genes
and then for T0 vs T8, once again do normalization between these two count data and obtain DE genes and so on...
However I am beginning to wonder if this is the way to go or to perform only one normalization by having counts from all genes from all time points altogether (B)
.
I am not able to convince myself of a good reason to choose between either. Have anyone of you had to work on this type of data or have an idea why you would go for (A)
or (B)
?
Thank you.
Thank you for your answer. However, I don't think the issue is if edgeR has the option to do normalization on all/more than two samples. Rather, which one is better / right? Doing normalization for each pair as and when I test for DE or normalize them all altogether and then test for DE on all pairs. But from what you say, it seems like normalization and then DE on all pairs. Right?
I think you have it right, yes.