Hi
I'm having a problem with a whole-exome-sequenced dataset consisting of about 400 human subjects, 200 cases with a certain disease, and 200 controls without. The dataset has been through a rigorous quality control (standardised QC in plink with HWE, IBD, missingness, sex-check ++, along with HapMap population stratification and Eigenstrat/PCA-analysis). I´m using plink to do a basic association-analysis for all variants between cases and controls, and while the resulting QQ-plot for the common (MAF > 0.01) variants is OK, the plot for the rare (MAF < 0.01) variants is less so. Below are the three QQ-plots for all, common and rare variants along with lambda-values:
- QQ-plot all variants, lambda 1.83 http://postimg.org/image/oyr53wchr/
- QQ-plot common variants, lambda 1.04 http://postimg.org/image/6lqjtc20v/
- QQ-plot rare variants, lambda 2.43 http://postimg.org/image/ic4haputb/
The main problem seems to be the positive deviation (observed > expected) of the rare variants in the first part of the plot, causing the lambda to be very big, both for the QQ-plot for rare and all variants. I am wondering what could be the cause of this behaviour for the rare variants, and also what the implications this has for the prospects of doing analysis on rare and common variants together.
I would be grateful if anybody has any experience in these matters and could provide some input.
Many thanks.
Thank you for your input. Is there another way to verify the quality of the rare variants? Or in other words, if the QC produces a good QQ-plots for the common variants, would you be satisfied and move forwards even though your planned analysis relies heavily on rare variants (collapsing rare variants on genes and pathways)?