Hi guys,
In practice I have some VCFs with a given phenotype sequenced with WGS that I have gathered in a multisample VCF to then obtain the MAF of each variant. I decided to compare the MAFs with the frequencies in the population of gnomADv2.1.1 genomes (GNOMAD_V2_AF in the DB VCF) to see if there is a significant difference in frequencies via binomial test: binom.test (x, n, p = GNOMAD_V2_AF, alternative = c ("two.sided")) where x is the number of alleles found (AC) and n the total number of alleles in the multisample-VCF (AN). I'm trying to get a QQ-plot between my p-values and expected p-values as described in the last lines of https://genome.sph.umich.edu/wiki/Code_Sample:_Generating_QQ_Plots_in_R but they look extremely inflated. Unfortunately my samples are quite small ~ 25 people, could you tell me where is the mistake?
Did you check your top SNP (ordrer by pvalue) and check manually in gnomAD if AF seems consistent ?
It is strange because I have now manually searched for the first SNPs, all match in the gnomAD VCF but some do not seem to be present in the gnomAD site (ex: 2-167167308). I downloaded this vcf with gsutil: gs: //gcp-public-data--gnomad/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.2.vcf and searched in the same version of the site . I made the annotation with vcfanno
are you sure you've calculated your allele frequency right? homozygous variants should be counted twice
even the most left part of the tail differs from the expected so it is, as you've noted, totally wrong...
Vcf homozygous for a variant give a contribution of 2 to AC of the multisampleVCF while those heterozygous 1; the value AN is given by 2 x the number of elements in the sample
I'd suggest to make a plot of allele frequencies (not the p-values) expected vs observed. If it is diagonal - it is the problem with test, if it is not - with allele frequencies
Thanks for the advice, it looks like it's diagonal
yup, then you test it wrong =) it should be a proportion test, not a Binomial test
Now the plots are going well! thank you very much!