Hi all,
I am doing a GWAS using plants grown under 2 different environments, trying to associate SNPs with growth related traits, I have a limited sample size, about 100 plants in each each env. so i combine data from the two environments and include environment as a covariate (a variable indicating environment origin either 1 or 2) and this works fine I get results that look good and make sense.
However, I am also curious if there are SNPs showing a GxE interaction and worry that not accounting for this might bias my results in the way that I would potentially miss a lot of SNPs that actually display GxE and therefore are not identified in my analyses.
When actually adding the GxE term in my model the QQ plots show really bad inflation and the models fit poorly. I am under the impression that a key issue is sample size which would need to be a lot higher based on Smith, P. G. & Day, N. E. The design of case–control studies: the influence of confounding and interaction effects. Int. J. Epidemiol. 13, 356–365 (1984). Yet some more recent work highlight spurious inflation of the QQ plots under GxE tests (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019416)
I am wondering if anyone has a similar experience and perhaps any pointer on potential solutions
I am leaning towards accepting that I don't have large enough sample size to identify GxE and going on with the main effect SNPs I have detected.
/rexalox647
As your E in this case is binary then (following the linked Voorman article) the only way in which your model (
p ~ G + E + G x E
) can be mis-specified is if the residuals within each environment (e.g., the variances) are different. (Differences in variance at the genotype level should show up as inflation of the "standard" GWAS). This could be addressable by using a mixed effects model with both a fixed and random term for environment.