Hi, I have an statistics doubt that probably is basic but I'm not clear about that.
I'm trying to find genes with differential expression, which RNA levels depend on 3 experimental factors (factor A and B with 2 levels each and a factor C with 3 levels). We've done a 3-way ANOVA from the log2(normalized counts) from different libraries and also a linear regression to look for genes with a p<0.05 for the intersection of the 3 factors.
I'm not sure if it is correct to do a linear regression or an ANOVA, considering my experimental design. I've observed that the number of genes with a significant p-value for the triple intersection is very lower when we apply the linear regression.
Does anyone can help me if it is a correct statistical analysis? I really don't understand very well the difference between ANOVA and linear regression, but I've seen that linear regression is always used in 2-way ANOVA for RNA seq data.
Any help is welcome.
Thanks in advance.
Indeed, please use one of the dedicated programs for this. Both EdgeR and DESeq2 fit a negative binomial model to your counts and derive p-values from this model. In doing this, they take dispersion into account. Limma does it differently, as elaborated by b.nota.
And all of them supports "ANOVA-Style" tests - check the vignettes for the different tools.
Hi Kevin,
Can we check if the main effects or interaction effects are significant in DESEQ2? I want to see if the interaction effect is significant but I couldn't see the p-value of interaction and main effects separately in DESEQ2? I might be missing something. Do you have any insights on this?
Thanks
There is information on this in the
results()
function manual entry, i.e., information on how to get the p-values for the interaction term. Simply type?results
at the command prompt and scroll to the bottom where examples are provided.Hi Kevin,
Thanks you for the response. When I use results(), I get something like this;
I think the output p-value column is the one showing DEGs because of the interaction. Does it mean that if I get many DEGs at p.adj < 0.05 from the interaction effect, I should consider the DEGs from interaction only and not consider the main effects?