Inlfated QQ plot GWAS
1
0
Entering edit mode
5 months ago
Pedro • 0

Hi there,

I'm running a GWAS on close to 2000 animals using 500K SNP data in a repeatability model. My phenotypic dataset has 10 observations per animal on average. I'm using a GRM calculated with 20K SNP data to account for population stratification, even though my population is homogeneous (checked with a PCA). I'm not filtering for MAF. When I check the distribution of my p-values with a QQ plot, I get a strange pattern. The observed and expected values match up to -log(p-val) = 2. After this point it just looks like there's population structure. A bonferroni puts my threshold in -log(pval)> 7. An FDR with 5% puts my threshold in -log(pval) > 4. Can anyone offer an explanation on why this is happening or a way to deal with this?

Cheers

enter image description here

GWAS • 382 views
ADD COMMENT
0
Entering edit mode

You mention homogeneity of an animal population. How large are the haploblocks, on average?

ADD REPLY
0
Entering edit mode
5 months ago
LChart 4.7k

Seeing "Animals" and "homogenous population" seems like a bit of a red flag. I would expect either some kind of cross-breeding experiment for trait/linkage studies, or pedigree-based experiments for breeding studies ("Homogenous population" seems borrowed from a sampling-based study). In crossing/breeding cases, the GRM is controlling for the pedigree structure as a source of stratification - in which case there should also be covariates associated with parental strains, grazing/housing/plot groups, and other environmental factors. Failing to account for these may result in an inflation of the test statistics.

Another possibility are batch effects on the genotyping chips, sample collection, library preparation date, etc. These can be largely mitigated by statistical genotype refinement (typically phasing/imputation), or by including indicators for those covariates. You may see differences in call rates or other array QC metrics that block out in some obvious way.

Another source of inflation is simple LD. Each associated variant will cause additional associations of everything in high LD with it. It may be better to determine LD blocks, and (when you make the Q-Q plot) choose a variant at random from each block.

ADD COMMENT

Login before adding your answer.

Traffic: 3851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6