Hi all,
I am running a human whole blood transciptomics experiment where two groups of patients are compared. The two groups consist of patients with two different disease. The goal of the experiment is to get insight in the different profiles of gene expression between the two groups. Now I have read the EdgeR manual and other documentation online as on Pubmed, but I am still not sure how I can construct a valid design matrix.
Of course I have included the conditions in the design matrix, but I am wondering which other factors or covariates I should correct for. It seems to be that you should correct for factors/covariates that could confound gene expression. In online tutorials and articles these are mostly things as batch effects, specific time points or different treatment conditions. But the question I keep asking myself is, do I have to correct for covariates such as age, white blood cell differentiation, other laboratory measurments (such as NTproBNP, kidney function etc. etc.) or factors such as sex and comorbities (so for example hypertension, diabetes, COPD etc.).
I have constructed the model matrix with the model.matrix function such as:
design <- model.matrix(~ 0 + types, data = y$samples)
Also I would like to know if I have to include an intercept term when including a covariate, so for a basic example:
design <- model.matrix(~types + age + sex + COPD (and so on...), data = y$samples)
or is the right way:
design <- model.matrix(~0 +types + age, data = y$samples)
Any thoughts or comments?