I'm a novice in this field, and I would be glad if anyone could provide some guidance on analysis of differential expression testing using linear model packages in R.
I am running a test for differential expression at the pathway level using the lmfit
function in limma
on pathway activity scores obtained from the GSVA
package. My goal is to identify which pathways are differentially expressed in individuals with Alzheimer's disease (AD) compared to control individuals.
To perform this analysis, I have constructed a design matrix that includes the AD variable, along with other covariates such as sex
, age_death
, amyloid_burden
, and nft_burden
.
My design matrix looks like this
mod = model.matrix( ~ AD + sex + age_death + amyloid + nft , data=predict)
However, since amyloid and NFT were used to determine the pathological status of AD from histopathological analysis, I am unsure whether it is necessary to include them as confounding covariates in the design matrix. I would appreciate any advice on whether to include these covariates or not.
Also, while the age_death
metadata mostly contains numerical (floating point) numbers, some individuals have been assigned 90+, which I believe the design matrix would treat as categorical. Should I drop such instances or change the ages to 90 or some arbitrary number greater than 90?
I would greatly appreciate any help or suggestions you can offer regarding these issues.