Question

QQ plots for eqtl data

0

Entering edit mode

7.1 years ago

jamespower ▴ 100

Hi,

What would you expect from a QQ plot in associations of SNP with gene expression across a whole chromosome? It should be extremely inflated but does anyone have any idea of how inflated? For example, in chromosome 22, out of ~2 million SNPs analyzed, about 200,00 have a p-value < 0.05. Is this expected?

Thank you for any insight!

MatrixEQTL eqtl • 4.7k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 7.1 years ago by jamespower ▴ 100

0

Entering edit mode

...sample size?; ...number of cases?; ...number of controls?; ...any covariate adjustments?; ...any pre-filtering of SNPs?

Thank you!

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin, thanks for reply, sample size is 1,000 and it is a quantitative trait for gene expression, so assuming all pre-filters and adjustments are correct (filtering is done following regular thresholds using Hardy-Weinberg and keeping common SNPs only), would that be what we would expect?

ADD REPLY • link 7.1 years ago by jamespower ▴ 100

0

Entering edit mode

That's a large enough dataset. I believe that any chromosome will show a skewed QQ plot if that chromosome is not well covered by markers, such as chr22 usually is (I believe). Which MAF did you use for filtering?

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

MAF is >5% and these numbers and inflated QQ plots are similar across chromosomes actually...

ADD REPLY • link 7.1 years ago by jamespower ▴ 100

0

Entering edit mode

Okay, could be related to your disease-alleles. Would help if you shared [some of] your QQ plots. You can do this via ImgBB, for example, by uploading and then pasting the HTML URL here.

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Thank you, here it is... Note that I don't have disease alleles, this is association with gene expression... Thank you for any feedback.

ADD REPLY • link updated 7.1 years ago by GenoMax 151k • written 7.1 years ago by jamespower ▴ 100

0

Entering edit mode

Wait, thanks for giving the extra reminder at the very bottom, i.e., that these are expression trait loci p-values. Those are not expected to follow the typical quantile distribution as one would expect from GWAS. In summary: I do not believe that you need to worry too much about this. Please take a look at other literature where QQ plots have been generated from eQTL p-values.

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

great thank you Kevin, that makes sense.

ADD REPLY • link 7.1 years ago by jamespower ▴ 100

0

Entering edit mode

I agree with Kevin. However, may I ask which software did you use for eQTL analysis? In my experience, when using matrixEQTL I observed a lot of significant results (with very low pvalue) when one of the three genotypic classes is rare (and that was AFTER filtering for rare alleles). Some of those might be false positives.

ADD REPLY • link 7.1 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Hi Fabio, thanks for your feedback! I have used matrixEQTL. That is worrying indeed, especially happening even after filtering rare SNPs before the association... is there any way you may know of filtering those cases further, or maybe another software that better controls for this? (or did you just resort to tabulating the genotypes for each associated SNP and removing those cases?)

ADD REPLY • link 7.1 years ago by jamespower ▴ 100

0

Entering edit mode

The last option :-( I actually, remove SNPs for which I observe a rare genotype BEFORE the analysis, but after a preliminary check to have an idea of "suspect results". For example, the one in the figure is - in my opinion - suspect (-1 is missing data, 0,1 and 2 are the three possible genotypes).

enter image description here

But you can easily find even worse cases if you look at your 10^-200 pvalues! I was considering about doing a first pass on matrixEQTL, only select a subset of genes/SNPs and then analyze the subset on EMMAX (which is presumably more robust, but of course will be slower). For the moment, I am happy with filtering, but I am still in the exploratory phase.

ADD REPLY • link 7.1 years ago by Fabio Marroni ★ 3.0k

1

Entering edit mode

Thank you Fabio for helping. I know that you have been doing a lot of GWAS work recently. Please continue the discussion! :)

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

1

Entering edit mode

I am just struggling to learn, actually!

ADD REPLY • link 7.1 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Va bene!

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k