Question

Sample size, GxE GWAS

0

Entering edit mode

23 months ago

rexalox647 • 0

Hi all,

I am doing a GWAS using plants grown under 2 different environments, trying to associate SNPs with growth related traits, I have a limited sample size, about 100 plants in each each env. so i combine data from the two environments and include environment as a covariate (a variable indicating environment origin either 1 or 2) and this works fine I get results that look good and make sense.

However, I am also curious if there are SNPs showing a GxE interaction and worry that not accounting for this might bias my results in the way that I would potentially miss a lot of SNPs that actually display GxE and therefore are not identified in my analyses.

When actually adding the GxE term in my model the QQ plots show really bad inflation and the models fit poorly. I am under the impression that a key issue is sample size which would need to be a lot higher based on Smith, P. G. & Day, N. E. The design of case–control studies: the influence of confounding and interaction effects. Int. J. Epidemiol. 13, 356–365 (1984). Yet some more recent work highlight spurious inflation of the QQ plots under GxE tests (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019416)

I am wondering if anyone has a similar experience and perhaps any pointer on potential solutions

I am leaning towards accepting that I don't have large enough sample size to identify GxE and going on with the main effect SNPs I have detected.

/rexalox647

covariate GxE GWAS • 1.1k views

ADD COMMENT • link updated 23 months ago by rimgubaev ▴ 340 • written 23 months ago by rexalox647 • 0

1

Entering edit mode

As your E in this case is binary then (following the linked Voorman article) the only way in which your model (p ~ G + E + G x E) can be mis-specified is if the residuals within each environment (e.g., the variances) are different. (Differences in variance at the genotype level should show up as inflation of the "standard" GWAS). This could be addressable by using a mixed effects model with both a fixed and random term for environment.

ADD REPLY • link 23 months ago by LChart 4.7k

score 0 · Answer 1 · 2023-01-23

As you have a small sample size and are interested in the particular SNPs I would do the following:

1) Select candidate SNPs based on the proportion of phenotype variance that they explain (basically it is an R2 value) and significance level (p-value).

2) Make boxplots for these selected SNPs (to see the dominance/recessiveness).

3) Make an analysis of variance for Year:SNP effect to test if the specific SNP is stable across the years for this trait.

4) Validate the effect (KASP for example) using the independent plant sample, to test marker reliability.

This is more practical advice, and if you plan to do basic research on GxE for your species, 100 accessions is definitely not enough, unfortunately. But it might be enough if the trait is monogenic and its heritability is around 90%. I personally had the same issue and analyzed only 90 accessions for trait with low-moderat heritability of 40-50%. As the main idea of the research was to find SNP for breeding I just did what I described above. Good luck with your research!