Entering edit mode
6.6 years ago
jamespower
▴
100
Hi,
What would you expect from a QQ plot in associations of SNP with gene expression across a whole chromosome? It should be extremely inflated but does anyone have any idea of how inflated? For example, in chromosome 22, out of ~2 million SNPs analyzed, about 200,00 have a p-value < 0.05. Is this expected?
Thank you for any insight!
...sample size?; ...number of cases?; ...number of controls?; ...any covariate adjustments?; ...any pre-filtering of SNPs?
Thank you!
Hi Kevin, thanks for reply, sample size is 1,000 and it is a quantitative trait for gene expression, so assuming all pre-filters and adjustments are correct (filtering is done following regular thresholds using Hardy-Weinberg and keeping common SNPs only), would that be what we would expect?
That's a large enough dataset. I believe that any chromosome will show a skewed QQ plot if that chromosome is not well covered by markers, such as chr22 usually is (I believe). Which MAF did you use for filtering?
MAF is >5% and these numbers and inflated QQ plots are similar across chromosomes actually...
Okay, could be related to your disease-alleles. Would help if you shared [some of] your QQ plots. You can do this via ImgBB, for example, by uploading and then pasting the HTML URL here.
Thank you, here it is... Note that I don't have disease alleles, this is association with gene expression... Thank you for any feedback.
Wait, thanks for giving the extra reminder at the very bottom, i.e., that these are expression trait loci p-values. Those are not expected to follow the typical quantile distribution as one would expect from GWAS. In summary: I do not believe that you need to worry too much about this. Please take a look at other literature where QQ plots have been generated from eQTL p-values.
great thank you Kevin, that makes sense.
I agree with Kevin. However, may I ask which software did you use for eQTL analysis? In my experience, when using matrixEQTL I observed a lot of significant results (with very low pvalue) when one of the three genotypic classes is rare (and that was AFTER filtering for rare alleles). Some of those might be false positives.
Hi Fabio, thanks for your feedback! I have used matrixEQTL. That is worrying indeed, especially happening even after filtering rare SNPs before the association... is there any way you may know of filtering those cases further, or maybe another software that better controls for this? (or did you just resort to tabulating the genotypes for each associated SNP and removing those cases?)
The last option :-( I actually, remove SNPs for which I observe a rare genotype BEFORE the analysis, but after a preliminary check to have an idea of "suspect results". For example, the one in the figure is - in my opinion - suspect (-1 is missing data, 0,1 and 2 are the three possible genotypes).
But you can easily find even worse cases if you look at your 10^-200 pvalues! I was considering about doing a first pass on matrixEQTL, only select a subset of genes/SNPs and then analyze the subset on EMMAX (which is presumably more robust, but of course will be slower). For the moment, I am happy with filtering, but I am still in the exploratory phase.
Thank you Fabio for helping. I know that you have been doing a lot of GWAS work recently. Please continue the discussion! :)
I am just struggling to learn, actually!
Va bene!