Should we include genotype as covariates when performing EWAS?
1
1
Entering edit mode
2.5 years ago
samuelandjw ▴ 260

It was reported that DNA methylation levels are under genetic controls. In particular, it was shown that cis-genetics contribute quite a lot to the expression levels to a point that cis-genetic factors can be considered a potential confounder in epigenome wide association studies (EWAS). However, according to EWAS catalog, almost none of the EWAS included cis-genetics factors, e.g., meQTL, as covariates. (Including genomic PCs to correct for ancestry background does not count, in my opinion) The only EWAS that included cis-SNPs was related to metabolomics. In another EWAS meta-analysis, significant differentially methylation positions obtained from an EWAS without adjusting cis-SNPs were investigated again to exclude positions that are confounded by genetic variants. I don't understand why they didn't cis-genetic factors in the first place.

Since one of the reasons why we conduct EWAS is to look for epigenetics factor that cannot be explained by genetics, it is very natural and necessary to include genotypes as covariates in EWAS, but I rarely see it done in the literature. Are there good reasons not to include genotypes in the primary analysis in addition to cost? If we should include genotypes as covariates in EWAS, what is best way to include cis-genetic factors to EWAS? Including meQTLs identified in prior studies?

methylation EWAS DNA • 633 views
ADD COMMENT
1
Entering edit mode
2.5 years ago
LauferVA 4.5k

Based on your phrasing, it seems you already know everything that you need to know, in particular about modeling. Actually as I think about this post, I suppose I'll just say things I think you know, and provide a bit of encouragement.

one of the reasons why we conduct EWAS is to look for epigenetics factor that cannot be explained by genetics, it is very natural and necessary to include genotypes as covariates in EWAS, but I rarely see it done in the literature

Total effect = genotype effect + environmental effect + G x E interaction effect

As I think that you know already, we can build statistical models that partial out the variance that depends upon each one of these, after controlling for one or both of the other two. In this sense the answer to this question is really, "what question are you trying to ask of the data".

At any rate from a statistical standpoint, generally speaking, it is usually better to code for the genotype as a covariate if you are aiming at (for instance) specifically the Environmental effect (or E + GxE together). Simply put, you are removing the sums of squares that relate to the genetic variants, thus leaving a smaller amount of variance left to explain. Thus, usually, you'll be better off doing the covariate controlled analysis. Having said that, if you care to identify the genetic effects, then you will destroy the signal by doing so.

I would not worry too much if:

I rarely see it done in the literature

Whether its seen in the literature is important only insofar as the literature reflects best practice. I am not certain of this, but I would imagine there are a great number of people out there who are interested in the meQTL data itself for what it represents. Having said that, I also think it makes sense to run it first to get that list of meQTLs, then to run it again with those as a covariate. (in other words, how could the initial study have coded the meQTLs as covariates prior to the initial analysis that generated that list?)

I'd recommend you generate at least one model in which you have controlled for genotype, and one in which you have not. I'd compare the results and note things that display particularly strong changes. Prove to yourself what is the best representation of each effect, then present the analyses jointly. Seems to be the best approach especially considering your analysis/meta-analysis example.

P.S. - If the entire analysis looks totally different, I am not sure what I would do, but I might consider making a second post at that point.

ADD COMMENT

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6