Inflated p-values in QQ-plot
0
0
Entering edit mode
2.8 years ago
enee ▴ 20

Hi guys,

In practice I have some VCFs with a given phenotype sequenced with WGS that I have gathered in a multisample VCF to then obtain the MAF of each variant. I decided to compare the MAFs with the frequencies in the population of gnomADv2.1.1 genomes (GNOMAD_V2_AF in the DB VCF) to see if there is a significant difference in frequencies via binomial test: binom.test (x, n, p = GNOMAD_V2_AF, alternative = c ("two.sided")) where x is the number of alleles found (AC) and n the total number of alleles in the multisample-VCF (AN). I'm trying to get a QQ-plot between my p-values and expected p-values as described in the last lines of https://genome.sph.umich.edu/wiki/Code_Sample:_Generating_QQ_Plots_in_R but they look extremely inflated. Unfortunately my samples are quite small ~ 25 people, could you tell me where is the mistake?enter image description here

qqplot p-value R binomial maf • 2.0k views
ADD COMMENT
2
Entering edit mode

Did you check your top SNP (ordrer by pvalue) and check manually in gnomAD if AF seems consistent ?

ADD REPLY
0
Entering edit mode

It is strange because I have now manually searched for the first SNPs, all match in the gnomAD VCF but some do not seem to be present in the gnomAD site (ex: 2-167167308). I downloaded this vcf with gsutil: gs: //gcp-public-data--gnomad/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.2.vcf and searched in the same version of the site . I made the annotation with vcfanno

ADD REPLY
1
Entering edit mode

are you sure you've calculated your allele frequency right? homozygous variants should be counted twice

even the most left part of the tail differs from the expected so it is, as you've noted, totally wrong...

ADD REPLY
0
Entering edit mode

Vcf homozygous for a variant give a contribution of 2 to AC of the multisampleVCF while those heterozygous 1; the value AN is given by 2 x the number of elements in the sample

ADD REPLY
1
Entering edit mode

I'd suggest to make a plot of allele frequencies (not the p-values) expected vs observed. If it is diagonal - it is the problem with test, if it is not - with allele frequencies

ADD REPLY
0
Entering edit mode

Thanks for the advice, it looks like it's diagonalenter image description here

ADD REPLY
1
Entering edit mode

yup, then you test it wrong =) it should be a proportion test, not a Binomial test

ADD REPLY
0
Entering edit mode

Now the plots are going well! thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6