Entering edit mode
3.5 years ago
lluc.cabus
▴
20
Hi,
I'm running deseq2 to find differentially expressed genes between patients and controls. The problem is that I have many different variables that could have an effect on the expressed genes, such as age, gender, etc. How can I know which is the effect of those variables in the model? I know that normally the model includes all of the variables as covariates, but it can't be that it also includes some bias?
Maybe this is a naïve question, but I'm very new to these kind of analysis.
Thank you very much, Lluc
It depends what do you want to test. It is not always necessary to include all the variables in your model. For example are you testing a treatment? in this case after or before treatment should be enough. Do you want to test if there is any difference between gender?In that case use gender only. Bu if you finally decide to include all the variables in your model, you can examine your PCA plot using > 1000 genes, you can see any cluster based on your covariates.
I'm trying to find differentially expressed genes for the diagnosis of a disease in children. Therefore, I think that age will have a big weight, since I have a very big range of ages (from 6 months to 19 years). Here I don't expect gender to have such a strong effect, but I have to demonstrate it somehow.
In that case what you can do is to create age ranges (you can do some research before making a decision or if you work with clinician you can ask them the best way to stratified age). However you can create categories as infant (0-2); toddler (3-6), kid (1-10); teenager (11-16) and run the DESEq using those groups as different variables
Aside from plotting PCAs and coloring by variables (as Lila M suggests), this R package may help you to explore which variables are more important in explaining information in your data: variancePartition. Also, the pcrplot in the Enmix package is also useful (see the plot in page 18 of the user's guide).