Hi all,
I have ewas data from saliva samples (pediatric, variety of ages), and I've estimated cell type proportions with Houseman (with both {ewastools} and with {methylclock} which uses {meffil})
How many of the resulting cell types should I use as adjustment variables? I don't see this discussed, even in articles on cell type adjustment in ewas like Accounting for cellular heterogeneity is critical in epigenome-wide association studies, so I assume this is well-known/obvious but I'm not sure how to make this decision! Several cell types are near-zero, and there is close-to-linear dependence even with just a few cell types.
Thank you!
Thanks so much for your reply. That definitely makes some sense to me, but I do end up with convergence issues when I include all cell types. Do you have a feeling for what non-trivial would mean here? My samples are primarily buccal. The other cell types vary from >25% of samples having 0.5%-8%. Do you think 5% is non-trivial? 1%? Something much smaller?
Thank you!!
The very rare cell types are unlikely to be contributing substantially to methylation signal, so you can probably drop the lowest-abundance cell types until the total abundance dropped is ~5%.
Alternatively: This is a linear model. Multicollinearity between these covariate indicates that the data are low rank. You can linearly transform your cell proportions into an smaller full-rank matrix, without sacrificing the ability to correct for changes in cell type abundance -- the tradeoff is that you won't be able to interpret the coefficients as corresponding to any individual cell type, but instead to linear combinations of them.
The easiest way to do this is to run PCA on the proportion matrix, and use the sample loadings corresponding to all non-zero eigenvalues (say >0.1).